git.openwrt.org Git - openwrt/staging/blogic.git/log

starfire: use IS_ENABLED() instead of checking for built-in or module

The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c59x: use IS_ENABLED() instead of checking for built-in or module

The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/ethernet/mediatek/mtk_eth_soc.c
drivers/net/ethernet/qlogic/qed/qed_dcbx.c
drivers/net/phy/Kconfig

All conflicts were cases of overlapping commits.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"Mostly small sets of driver fixes scattered all over the place.

   1) Mediatek driver fixes from Sean Wang.  Forward port not written
      correctly during TX map, missed handling of EPROBE_DEFER, and
      mistaken use of put_page() instead of skb_free_frag().

   2) Fix socket double-free in KCM code, from WANG Cong.

   3) QED driver fixes from Sudarsana Reddy Kalluru, including a fix for
      using the dcbx buffers before initializing them.

   4) Mellanox Switch driver fixes from Jiri Pirko, including a fix for
      double fib removals and an error handling fix in
      mlxsw_sp_module_init().

   5) Fix kernel panic when enabling LLDP in i40e driver, from Dave
      Ertman.

   6) Fix padding of TSO packets in thunderx driver, from Sunil Goutham.

   7) TCP's rcv_wup not initialized properly when using fastopen, from
      Neal Cardwell.

   8) Don't use uninitialized flow keys in flow dissector, from Gao
      Feng.

   9) Use after free in l2tp module unload, from Sabrina Dubroca.

  10) Fix interrupt registry ordering issues in smsc911x driver, from
      Jeremy Linton.

  11) Fix crashes in bonding having to do with enslaving and rx_handler,
      from Mahesh Bandewar.

  12) AF_UNIX deadlock fixes from Linus.

  13) In mlx5 driver, don't read skb->xmit_mode after it might have been
      freed from the TX reclaim path.  From Tariq Toukan.

  14) Fix a bug from 2015 in TCP Yeah where the congestion window does
      not increase, from Artem Germanov.

  15) Don't pad frames on receive in NFP driver, from Jakub Kicinski.

  16) Fix chunk fragmenting in SCTP wrt. GSO, from Marcelo Ricardo
      Leitner.

  17) Fix deletion of VRF routes, from Mark Tomlinson.

  18) Fix device refcount leak when DAD fails in ipv6, from Wei Yongjun"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits)
  net/mlx4_en: Fix panic on xmit while port is down
  net/mlx4_en: Fixes for DCBX
  net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_state()
  net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_all()
  net: ethernet: renesas: sh_eth: add POST registers for rz
  drivers: net: phy: mdio-xgene: Add hardware dependency
  dwc_eth_qos: do not register semi-initialized device
  sctp: identify chunks that need to be fragmented at IP level
  mlxsw: spectrum: Set port type before setting its address
  mlxsw: spectrum_router: Fix error path in mlxsw_sp_router_init
  nfp: don't pad frames on receive
  nfp: drop support for old firmware ABIs
  nfp: remove linux/version.h includes
  tcp: cwnd does not increase in TCP YeAH
  net/mlx5e: Fix parsing of vlan packets when updating lro header
  net/mlx5e: Fix global PFC counters replication
  net/mlx5e: Prevent casting overflow
  net/mlx5e: Move an_disable_cap bit to a new position
  net/mlx5e: Fix xmit_more counter race issue
  tcp: fastopen: avoid negative sk_forward_alloc
  ...

Linux 4.8-rc6

Merge branch 'mlx4-fixes'

Tariq Toukan says:

====================
mlx4 fixes

This patchset contains several bug fixes from the team to the
mlx4 Eth driver.

Series generated against net commit:
c2f57fb97da5 "drivers: net: phy: mdio-xgene: Add hardware dependency"

v2:
* excluded some cleanup patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix panic on xmit while port is down

When port is down, tx drop counter update is not needed.
Updating the counter in this case can cause a kernel
panic as when the port is down, ring can be NULL.

Fixes: 63a664b7e92b ("net/mlx4_en: fix tx_dropped bug")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fixes for DCBX

This patch adds a capability check before enabling DCBX.
In addition, it re-organizes the relevant data structures,
and fixes a typo in a define.

Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_state()

mlx4_en_dcbnl_set_state() returns u8, the return value from
mlx4_en_setup_tc() could be negative in case of failure, so fix that.

Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_all()

mlx4_en_dcbnl_set_all() returns u8, so return value can't be negative in
case of failure.

Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: bcm_sf2: Get VLAN_PORT_MASK from b53_device

While migrating the bcm_sf2 driver to use b53_common, we left a small
piece untouched where we kept our local copy of the per-port
port_vlan_ctl bitmask value. This value is now maintained by b53_device
so we need to use it instead of our local (and now stale) copy of it.

Fixes: f458995b9ad8 ("net: dsa: bcm_sf2: Utilize core B53 driver when possible")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nvme: make NVME_RDMA depend on BLOCK

Commit aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci
driver") removed the dependency on BLK_DEV_NVME, but the cdoe does
depend on the block layer (which used to be an implicit dependency
through BLK_DEV_NVME).

Otherwise you get various errors from the kbuild test robot random
config testing when that happens to hit a configuration with BLOCK
device support disabled.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Jay Freyensee <james_p_freyensee@linux.intel.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'staging-4.8-rc6' of git://git./linux/kernel/git/gregkh/staging

Pull IIO fixes from Greg KH:
"Here are a few small IIO fixes for 4.8-rc6.

  Nothing major, full details are in the shortlog, all of these have
  been in linux-next with no reported issues"

* tag 'staging-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio:core: fix IIO_VAL_FRACTIONAL sign handling
  iio: ensure ret is initialized to zero before entering do loop
  iio: accel: kxsd9: Fix scaling bug
  iio: accel: bmc150: reset chip at init time
  iio: fix pressure data output unit in hid-sensor-attributes
  tools:iio:iio_generic_buffer: fix trigger-less mode

Merge tag 'usb-4.8-rc6' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB gadget, phy, and xhci fixes for 4.8-rc6.

  All of these resolve minor issues that have been reported, and all
  have been in linux-next with no reported issues"

* tag 'usb-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
  xhci: fix null pointer dereference in stop command timeout function
  usb: dwc3: pci: fix build warning on !PM_SLEEP
  usb: gadget: prevent potenial null pointer dereference on skb->len
  usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
  usb: phy: phy-generic: Check clk_prepare_enable() error
  usb: gadget: udc: renesas-usb3: clear VBOUT bit in DRD_CON
  Revert "usb: dwc3: gadget: always decrement by 1"

Merge branch 'vrf-tx-hook'

David Ahern says:

====================
net: Convert vrf to tx hook

The motivation for this series is that ICMP Unreachable - Fragmentation
Needed packets are not handled properly for VRFs. Specifically, the
FIB lookup in __ip_rt_update_pmtu fails so no nexthop exception is
created with the reduced MTU. As a result connections stall if packets
larger than the smallest MTU in the path are generated.

While investigating that problem I also noticed that the MSS for all
connections in a VRF is based on the VRF device's MTU and not the
route the packets ultimately go through. VRF currently uses a dst
to direct packets to the device. The first FIB lookup returns this dst
and then the lookup in the VRF driver gets the actual output route. A
side effect of this design is that the VRF dst is cached on sockets
and then used for calculations like the MSS.

This series fixes this problem by removing the hook in the FIB lookups
that returns the dst pointing to the VRF device to the VRF and always
doing the actual FIB lookup. This allows the real dst to be used
throughout the stack (for example the MSS). Packets are diverted to
the VRF device on Tx using an l3mdev hook in the output path similar to
to what is done for Rx. The end result is a simpler implementation for
VRF with fewer intrusions into the network stack and symmetrical packet
handling for Rx and Tx paths.

Comparison of netperf performance for a build without l3mdev (best case
performance), the old vrf driver and the VRF driver from this series.
Data are collected using VMs with virtio + vhost. The netperf client
runs in the VM and netserver runs in the host. 1-byte RR tests are done
as these packets exaggerate the performance hit due to the extra lookups
done for l3mdev and VRF.

Command: netperf -cC -H ${ip} -l 60 -t {TCP,UDP}_RR [-J red]

                      TCP_RR              UDP_RR
                   IPv4     IPv6       IPv4     IPv6
no l3mdev        29,996   30,601     31,638   24,336
vrf old          27,417   27,626     29,159   24,801
vrf new          28,036   28,372     30,110   24,857
l3mdev, no vrf   29,534   30,465     30,670   24,346

* Transactions per second as reported by netperf
* netperf modified to take a bind-to-device argument -- the -J red option

1. 'no l3mdev'      == NET_L3_MASTER_DEV is unset so code is compiled out
2. 'vrf old'        == data for existing implementation
3. 'vrf new'        == data with this series
4. 'l3mdev, no vrf' == NET_L3_MASTER_DEV is enabled but traffic is not
                       going through a VRF

About the series
- patch 1 adds the flow update (changing oif or iif to L3 master device
  and setting the flag to skip the oif check) to ipv4 and ipv6 paths just
  before hitting the rules. This catches all code paths in a single spot.

- patch 2 adds the Tx hook to push the packet to the l3mdev if relevant

- patch 3 adds some checks so the vrf device can act as a vrf-local
  loopback. These changes were not needed before since the vrf dst was
  returned from the lookup.

- patches 4 and 5 flip the ipv4 and ipv6 stacks to the tx hook leaving
  the route lookup to be the real one. The dst flip happens at the
  beginning of the L3 output path so the VRFs can have device based
  features such as netfilter, tc and tcpdump.

- patches 6-11 remove no longer needed l3mdev code

v2
- properly handle IPv6 link scope addresses

- keep the device xmit path and associated dst which is switched in by
  the l3_out hook. packets still need to go through the xmit path in
  case the user puts a qdisc on the vrf device and to allow tc rules.
  version 1 short circuited the tx handling and only covered netfilter
  and tcpdump.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: flow: Remove FLOWI_FLAG_L3MDEV_SRC flag

No longer used

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: l3mdev: remove get_rtable method

No longer used

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: l3mdev: Remove l3mdev_fib_oif

No longer used

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6: Remove l3mdev_get_saddr6

No longer needed

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: Remove l3mdev_get_saddr

No longer needed

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: l3mdev: remove redundant calls

A previous patch added l3mdev flow update making these hooks
redundant. Remove them.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: vrf: Flip IPv6 output path from FIB lookup hook to out hook

Flip the IPv6 output path to use the l3mdev tx out hook. The VRF dst
is not returned on the first FIB lookup. Instead, the dst on the
skb is switched at the beginning of the IPv6 output processing to
send the packet to the VRF driver on xmit.

Link scope addresses (linklocal and multicast) need special handling:
specifically the oif the flow struct can not be changed because we
want the lookup tied to the enslaved interface. ie., the source address
and the returned route MUST point to the interface scope passed in.
Convert the existing vrf_get_rt6_dst to handle only link scope addresses.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: vrf: Flip IPv4 output path from FIB lookup hook to out hook

Flip the IPv4 output path to use the l3mdev tx out hook. The VRF dst
is not returned on the first FIB lookup. Instead, the dst on the
skb is switched at the beginning of the IPv4 output processing to
send the packet to the VRF driver on xmit.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: l3mdev: Allow the l3mdev to be a loopback

Allow an L3 master device to act as the loopback for that L3 domain.
For IPv4 the device can also have the address 127.0.0.1.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: l3mdev: Add hook to output path

This patch adds the infrastructure to the output path to pass an skb
to an l3mdev device if it has a hook registered. This is the Tx parallel
to l3mdev_ip{6}_rcv in the receive path and is the basis for removing
the existing hook that returns the vrf dst on the fib lookup.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: flow: Add l3mdev flow update

Add l3mdev hook to set FLOWI_FLAG_SKIP_NH_OIF flag and update oif/iif
in flow struct if its oif or iif points to a device enslaved to an L3
Master device. Only 1 needs to be converted to match the l3mdev FIB
rule. This moves the flow adjustment for l3mdev to a single point
catching all lookups. It is redundant for existing hooks (those are
removed in later patches) but is needed for missed lookups such as
PMTU updates.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ATM-ZeitNet: Fix indentation for one DPRINTK() call in start_rx()

Adjust the indentation for a call of the macro "DPRINTK" in this function.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ATM-ZeitNet: Replace one kzalloc() call by kcalloc()

* The script "checkpatch.pl" can point information out like the following.

  WARNING: Prefer kcalloc over kzalloc with multiply

  Thus fix the affected source code place.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

* Delete the local variable "size" which became unnecessary with
  this refactoring.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ATM-ZeitNet: Improve a size determination in zatm_open()

Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ATM-ZeitNet: Use kmalloc_array() in start_tx()

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ATM-nicstar: Refactor a dev_alloc_skb() call in dequeue_rx()

The script "checkpatch.pl" can point out that assignments should usually
not be performed within condition checks.
Thus move an assignment for a local variable to a separate statement
in this function.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ATM-nicstar: Refactor a kmalloc() call in ns_init_card()

* The script "checkpatch.pl" can point out that assignments should usually
  not be performed within condition checks.
  Thus move an assignment for a local variable to a separate statement
  in this function.

* Replace the specification of a data structure by a pointer dereference
  as the parameter for the operator "sizeof" to make the corresponding size
  determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ATM-nicstar: Improve another size determination in ns_init_card()

Replace the specification of a data structure by a reference for a field
in a local variable as the parameter for the operator "sizeof" to make
the corresponding size determination a bit safer according to
the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ATM-nicstar: Improve another size determination in get_scq()

Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ATM-nicstar: Use kmalloc_array() in get_scq()

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: better use ooo_last_skb in tcp_data_queue_ofo()

Willem noticed that we could avoid an rbtree lookup if the
the attempt to coalesce incoming skb to the last skb failed
for some reason.

Since most ooo additions are at the tail, this is definitely
worth adding a test and fast path.

Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: use alias for genetlink family names

When userspace tries to create datapaths and the module is not loaded,
it will simply fail. With this patch, the module will be automatically
loaded.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "hv_netvsc: make inline functions static"

These functions are used by other code misc-next tree.

This reverts commit 30d1de08c87ddde6f73936c3350e7e153988fe02.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx5-next'

Saeed Mahameed says:

====================
Mellanox 100G mlx5 seamless error recovery

This series from Mohamad improves the driver load/unload flows
to seamlessly handle pci errors and device internal errors recovery
reset flows.

Current pci and internal error handling is too heavy and is done
with a full restart of the driver by unregistering mlx5 interfaces
(mlx5e netedevs and mlx5_ib) which will cause losing all the current
interfaces and mlx5 core configurations.

To improve this, we add new callback functions of mlx5 interface
object (attach/detach) to be called upon reset flows when errors are
detected rather than calling register and unregister interfaces.

On their side, interfaces such as (mlx5e and mlx5_ib) can choose to implement
those callback, if not, the old heavy reset will be called for that interface.

For non-interface mlx5 modules such as sriov and eswitch, we refactored
and reorganized the code in a way that the software state objects are created
only once on driver load. Those software state objects are kept upon reset recovery
flows and only freed once on driver unload. On seamless soft reset flows, only
hardware resources are released on stop and re-allocated on start according to the
current soft state.

In this series only mlx5e interface implements attach/detach callbacks
so that the netdevice will be kept alive on reset. On detach only hardware resources
are released and the netdevice will be marked as detached to the stack. Once
attached again it will re-allocate the hardware resources according to the current
netdevice state, and all the configurations and the software state will be kept or restored
after recovery.

Note: I will be out of office all next week, in case of any updates
or V2 is required, Tariq will post the new series, I hope it is ok.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Organize device list API in one place

Hide the exposed (external) mlx5_dev_list and mlx5_intf_mutex and expose
an organized modular API to manage and manipulate mlx5 devices list.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Restore vlan filter after seamless reset

When detaching the mlx5e interface clear all the vlans rules from the
vlan flow table.
When attaching it back restore all the active vlans rules to the HW.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Implement mlx5e interface attach/detach callbacks

Needed to support seamless and lightweight PCI/Internal error recovery.
Implement the attach/detach interface callbacks.
In attach callback we only allocate HW resources.
In detach callback we only deallocate HW resources.
All SW/kernel objects initialzing/destroying is kept in add/remove
callbacks.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Implement vports admin state backup/restore

Save the user configuration in the vport sturct.
Restore saved old configuration upon vport enable.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Align sriov/eswitch modules with the new load/unload flow.

Init/cleanup sriov/eswitch in the core software context init/cleanup
flows.
Attach/detach sriov/eswitch in the core load/unload flows.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Implement eswitch attach/detach flows

Needed for lightweight and modular internal/pci error handling.
Implement eswitch attach function which allocates/starts hw related
resources.
Implement eswitch detach function which releases/stops hw related
resources.
Init/cleanup function only handle eswitch software context allocation
and destruction.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Implement SRIOV attach/detach flows

Needed for lightweight and modular internal/pci error handling.
Implement sriov attach function which enables pre-saved number of vfs on
the device side.
Implement sriov detach function which disable the current vfs on the
device side.
Init/cleanup function only handles sriov software context allocation and
destruction.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Split the load/unload flow into hardware and software flows

Gather all software context creating/destroying in one function and call
it once in the first load and in the last unload.
load/unload functions will now receive indication if we need to
create/destroy the software contexts.
In internal/pci error do the unload/load flows without releasing the
software objects.
In this way we perserve the sw core state and it help us restoring old
driver state after PCI error/shutdown.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Introduce attach/detach to interface API

Add attach/detach callbacks to interface API.
This is crucial for implementing seamless reset flow which releases the
hardware and it's resources upon detach while keeping software
structures and state (e.g netdev) then reset and reallocate the hardware
needed resources upon attach.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: SRIOV core code refactoring

Simplify the code and makes it look modular and symmetric.
Split sriov enable/disable to two levels: device level and pci level.
When user enable/disable sriov (via sriov_configure driver callback) we
will enable/disable both device and pci sriov.
When driver load/unload we will enable/disable (on demand) only device
sriov while keeping the PCI sriov enabled for next driver load.
On internal/pci error, VFs will be kept enabled on PCI and the reset
is done only in device level.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Skip waiting for vf pages in internal error

In case of device in internal error state there is no need to wait for
vf pages since they will be reclaimed manually later in the unload flow.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-is_enabled'

Javier Martinez Canillas says:

====================
net: use IS_ENABLED() instead of checking for built-in or module

This trivial series replace the open coding to check for a Kconfig symbol
being built-in or module, with IS_ENABLED() macro that does exactly that.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm: use IS_ENABLED() instead of checking for built-in or module

The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: use IS_ENABLED() instead of checking for built-in or module

The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: use IS_ENABLED() instead of checking for built-in or module

The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: use IS_ENABLED() instead of checking for built-in or module

The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: use IS_ENABLED() instead of checking for built-in or module

The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use IS_ENABLED() instead of checking for built-in or module

The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lec: use IS_ENABLED() instead of checking for built-in or module

The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

appletalk: use IS_ENABLED() instead of checking for built-in or module

The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fs_enet-opt'

Christophe Leroy says:

====================
Optimisation of fs_enet ethernet driver

This set optimises the freescale fs_enet ethernet driver:
1/ Merge of RX and TX NAPI functions in order to limit the amount of
interrupts
2/ Do not unmap DMA when packets len is below copybreak, otherwise there
is no benefit in copying the skb instead of allocating a new one
3/ Make copybreak value configurable as the optimised value is not the
same on all targets
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: fs_enet: make rx_copybreak value configurable

Measurement shows that on a MPC8xx running at 132MHz, the optimal
limit is 112:
* 114 bytes packets are processed in 147 TB ticks with higher copybreak
* 114 bytes packets are processed in 148 TB ticks with lower copybreak
* 128 bytes packets are processed in 154 TB ticks with higher copybreak
* 128 bytes packets are processed in 148 TB ticks with lower copybreak
* 238 bytes packets are processed in 172 TB ticks with higher copybreak
* 238 bytes packets are processed in 148 TB ticks with lower copybreak

However it might be different on other processors
and/or frequencies. So it is useful to make it configurable.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fs_enet: don't unmap DMA when packet len is below copybreak

When the length of the packet is below the defined copybreak limit,
the received packet is copied into a newly allocated skb in order
to reuse the skb. This is only interesting if it allow us to avoid
a new DMA mapping. We shall therefore not DMA unmap and remap the
skb->data. Instead, we invalidate the cache
with dma_sync_single_for_cpu() once the received data has been
copied into the new skb.

The following measures have been obtained on a mpc885 running at 132Mhz.
Measurement is done using the timebase with packets sent to the target
with 'ping -s 1' (packet len is 60):
* Without this patch: 182 TB ticks
* With this patch: 143 TB ticks

As a comparison, if we set the copybreak limit to 0, then we get
148 TB ticks. It means that without this patch, duration is even
worse when copying received data to a new skb instead of
allocating a new skb for next packet to be received

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fs_enet: merge NAPI RX and NAPI TX

Initially, a NAPI TX routine has been implemented separately from
NAPI RX, as done on the freescale/gianfar driver.

By merging NAPI RX and NAPI TX, we reduce the amount of TX completion
interrupts.

Handling of the budget in association with TX interrupts is based on
indications provided at https://wiki.linuxfoundation.org/networking/napi
We never proceed more than the complete TX ring on a single run.

At the same time, we fix an issue in the handling of fep->tx_free:

It is only when fep->tx_free goes up to MAX_SKB_FRAGS that
we need to wake up the queue. There is no need to call
netif_wake_queue() at every packet successfully transmitted.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: renesas: sh_eth: add POST registers for rz

Due to a mistake in the hardware manual, the FWSLC and POST1-4 registers
were not documented and left out of the driver for RZ/A making the CAM
feature non-operational.
Additionally, when the offset values for POST1-4 are left blank, the driver
attempts to set them using an offset of 0xFFFF which can cause a memory
corruption or panic.

This patch fixes the panic and properly enables CAM.

Reported-by: Daniel Palmer <daniel@0x0f.com>
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'act_tunnel_key'

Hadar Hen Zion says:

====================
net/sched: ip tunnel metadata set/release/classify by using TC

This patchset introduces ip tunnel manipulation support using the TC subsystem.

In the decap flow, it enables the user to redirect packets from a shared tunnel
device and classify by outer and inner headers. The outer headers are extracted
from the metadata and used by the flower filter. A new action act_tunnel_key,
releases the metadata.

In the encap flow, act_tunnel_key creates a metadata object to be used by the
shared tunnel device. The actual redirection to the tunnel device is done using
act_mirred.

For example:
$ tc qdisc add dev vnet0 ingress
$ tc filter add dev vnet0 protocol ip parent ffff: \
flower \
ip_proto 1 \
action tunnel_key set \
src_ip 11.11.0.1 \
dst_ip 11.11.0.2 \
id 11 \
action mirred egress redirect dev vxlan0

$ tc qdisc add dev vxlan0 ingress
$ tc filter add dev vxlan0 protocol ip parent ffff: \
flower \
enc_src_ip 11.11.0.2 \
enc_dst_ip 11.11.0.1 \
enc_key_id 11 \
action tunnel_key release \
action mirred egress redirect dev vnet0

Amir & Hadar

Changes from V6:
- Add kfree_rcu to tunnel_key_release function
- Use reverse Christmas tree order in tunnel_key_init function

Changes from V5:
- Add __rcu notation to struct tcf_tunnel_key_params in struct tcf_tunnel_key
- Fix indentation in include/net/dst_metadata.h
- Fix syntx error in commit message

Changes from V4:
- Fix tunnel_key_init function error flow.
- Add 'action' variable to struct tcf_tunnel_key_params and use it instead of
  tcf_action variable which is not protected by rcu lock.

Changes from V3:
- Use percpu stats
- No spinlock on datapatch - protecting parameters with rcu
- Fix buggy handling of set/release dst
- Use nla_get_in_addr and nla_put_in_addr
- Fix change logs
- Pass in6_addr by pointer
- Rename utility functions to start with double underscore

Changes from V2:
- Use union in struct fl_flow_key for enc_ipv6 and enc_ipv4.
- Rename functions _ip_tun_rx_dst and _ipv6_tun_rx_dst to _ip_tun_set_dst and
  _ipv6_tun_set_dst accordingly.
- Remove local parameter 'encapdecap' from tunnel_key_init function.
- Don't copy in6_addr values in tunnel_key_dump_addresses function, use pointers.

Changes from V1:
- More cleanups to key32_to_tunnel_id() and tunnel_id_to_key32()
- IPv6 Support added
- Set TUNNEL_KEY flag to make GRE work
- Handle zero tunnel id properly in act_tunnel_key
- Don't leave junk in decap action
- Fix bug in act_tunnel_key initialization where (exists & ocr) is true
- Remove BUG() from code
- Rename action to tunnel_key
- Improve grep-ability of code
- Reuse code from ip_tun_rx_dst() and ipv6_tun_rx_dst()

Changes from RFC:
- Add a new action instead of making mirred too complex
- No need to specify UDP port in action - it is already in the tunnel device
  configuration
- Added a decap operation to drop tunnel metadata
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: Introduce act_tunnel_key

This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The action will release the metadata created by the tunnel device
(decap), or set the metadata with the specified values for encap
operation.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ tc filter add dev net0 protocol ip parent ffff: \
    flower \
      ip_proto 1 \
      dst_ip 11.11.11.2 \
    action tunnel_key set \
      src_ip 11.11.0.1 \
      dst_ip 11.11.0.2 \
      id 11 \
    action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: cls_flower: Classify packet in ip tunnels

Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0' (after metadata is released):

$ tc filter add dev vxlan0 protocol ip parent ffff: \
    flower \
      enc_src_ip 11.11.0.2 \
      enc_dst_ip 11.11.0.1 \
      enc_key_id 11 \
      dst_ip 11.11.11.1 \
    action tunnel_key release \
    action mirred egress redirect dev vnet0

The action tunnel_key, will be introduced in the next patch in this
series.

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/dst: Utility functions to build dst_metadata without supplying an skb

Extract __ip_tun_set_dst() and __ipv6_tun_set_dst() out of
ip_tun_rx_dst() and ipv6_tun_rx_dst(), to be used without supplying an
skb.

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()

Add utility functions to convert a 32 bits key into a 64 bits tunnel and
vice versa.
These functions will be used instead of cloning code in GRE and VXLAN,
and in tc act_iptunnel which will be introduced in a following patch in
this patchset.

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
"nvdimm fixes for v4.8, two of them are tagged for -stable:

   - Fix devm_memremap_pages() to use track_pfn_insert().  Otherwise,
     DAX pmd mappings end up with an uncached pgprot, and unusable
     performance for the device-dax interface.  The device-dax interface
     appeared in 4.7 so this is tagged for -stable.

   - Fix a couple VM_BUG_ON() checks in the show_smaps() path to
     understand DAX pmd entries.  This fix is tagged for -stable.

   - Fix a mis-merge of the nfit machine-check handler to flip the
     polarity of an if() to match the final version of the patch that
     Vishal sent for 4.8-rc1.  Without this the nfit machine check
     handler never detects / inserts new 'badblocks' entries which
     applications use to identify lost portions of files.

   - For test purposes, fix the nvdimm_clear_poison() path to operate on
     legacy / simulated nvdimm memory ranges.  Without this fix a test
     can set badblocks, but never clear them on these ranges.

   - Fix the range checking done by dax_dev_pmd_fault().  This is not
     tagged for -stable since this problem is mitigated by specifying
     aligned resources at device-dax setup time.

  These patches have appeared in a next release over the past week.  The
  recent rebase you can see in the timestamps was to drop an invalid fix
  as identified by the updated device-dax unit tests [1].  The -mm
  touches have an ack from Andrew"

[1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
   https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm: allow legacy (e820) pmem region to clear bad blocks
  nfit, mce: Fix SPA matching logic in MCE handler
  mm: fix cache mode of dax pmd mappings
  mm: fix show_smap() for zone_device-pmd ranges
  dax: fix mapping size check

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Mostly driver bugfixes, but also a few cleanups which are nice to have
  out of the way"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: rk3x: Restore clock settings at resume time
  i2c: Spelling s/acknowedge/acknowledge/
  i2c: designware: save the preset value of DW_IC_SDA_HOLD
  Documentation: i2c: slave-interface: add note for driver development
  i2c: mux: demux-pinctrl: run properly with multiple instances
  i2c: bcm-kona: fix inconsistent indenting
  i2c: rcar: use proper device with dma_mapping_error
  i2c: sh_mobile: use proper device with dma_mapping_error
  i2c: mux: demux-pinctrl: invalidate properly when switching fails

Merge tag 'for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull fscrypto fixes fromTed Ts'o:
"Fix some brown-paper-bag bugs for fscrypto, including one one which
  allows a malicious user to set an encryption policy on an empty
  directory which they do not own"

* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  fscrypto: require write access to mount to set encryption policy
  fscrypto: only allow setting encryption policy on directories
  fscrypto: add authorization check for setting encryption policy

fscrypto: require write access to mount to set encryption policy

Since setting an encryption policy requires writing metadata to the
filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
Otherwise, a user could cause a write to a frozen or readonly
filesystem. This was handled correctly by f2fs but not by ext4. Make
fscrypt_process_policy() handle it rather than relying on the filesystem
to get it right.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>

ATM-iphase: Use kmalloc_array() in tx_init()

* Multiplications for the size determination of memory allocations
  indicated that array data structures should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of data types by pointer dereferences
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'alx-msix'

Tobias Regnery says:

====================
alx: add msi-x support

This patchset adds msi-x support to the alx driver. It is a preparatory
series for multi queue support, which I am currently working on. As there
is no advantage over msi interrupts without multi queue support, msi-x
interrupts are disabled by default. In order to test for regressions, a
new module parameter is added to enable msi-x interrupts.

Based on information of the downstream driver at github.com/qca/alx
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

alx: add module parameter to enable msi-x support

msi-x support is default disabled in the alx driver. In order to test msi-x
interrupts for regressions add a module parameter to the driver.

Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

alx: add msi-x support

Add msi-x support to the alx driver. This is in preparation for multi queue
support.

msi-x interrupts are disabled by default because without multi queue support
there is no advantage over msi interrupts. The performance numbers observed
with iperf stay the same.

Based on information of the downstream driver at github.com/qca/alx

Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

alx: factor out part of the interrupt handler

Factor out the handling of misc interrupts into a new function.
This function can be reused later for msi-x interrupts.

Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

alx: refactor msi enablement and disablement

Introduce a new flag field for the advanced interrupt capatibilities and add
new functions to enable and disable msi interrupts. These functions will be
extended later to cover msi-x interrupts.

We enable msi interrupts earlier in alx_init_intr because with msi-x and multi
queue support the number of queues must be set before we allocate resources for
the rx and tx paths.

Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fscrypto: only allow setting encryption policy on directories

The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
policy on nondirectory files. This was unintentional, and in the case
of nonempty regular files did not behave as expected because existing
data was not actually encrypted by the ioctl.

In the case of ext4, the user could also trigger filesystem errors in
->empty_dir(), e.g. due to mismatched "directory" checksums when the
kernel incorrectly tried to interpret a regular file as a directory.

This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
kernels v4.6 and later. It appears that older kernels only permitted
directories and that the check was accidentally lost during the
refactoring to share the file encryption code between ext4 and f2fs.

This patch restores the !S_ISDIR() check that was present in older
kernels.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

fscrypto: add authorization check for setting encryption policy

On an ext4 or f2fs filesystem with file encryption supported, a user
could set an encryption policy on any empty directory(*) to which they
had readonly access.  This is obviously problematic, since such a
directory might be owned by another user and the new encryption policy
would prevent that other user from creating files in their own directory
(for example).

Fix this by requiring inode_owner_or_capable() permission to set an
encryption policy.  This means that either the caller must own the file,
or the caller must have the capability CAP_FOWNER.

(*) Or also on any regular file, for f2fs v4.6 and later and ext4
    v4.8-rc1 and later; a separate bug fix is coming for that.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

qed: mark symbols static where possible

We get a few warnings when building kernel with W=1:
drivers/net/ethernet/qlogic/qed/qed_l2.c:112:5: warning: no previous prototype for 'qed_sp_vport_start' [-Wmissing-prototypes]
drivers/net/ethernet/qlogic/qed/qed_sriov.c:110:6: warning: no previous prototype for 'qed_iov_is_valid_vfid' [-Wmissing-prototypes]
drivers/net/ethernet/qlogic/qed/qed_sriov.c:188:5: warning: no previous prototype for 'qed_iov_post_vf_bulletin' [-Wmissing-prototypes]
drivers/net/ethernet/qlogic/qed/qed_sriov.c:578:6: warning: no previous prototype for 'qed_iov_set_vfs_to_disable' [-Wmissing-prototypes]
drivers/net/ethernet/qlogic/qed/qed_sriov.c:1135:28: warning: no previous prototype for 'qed_iov_get_public_vf_info' [-Wmissing-prototypes]
drivers/net/ethernet/qlogic/qed/qed_sriov.c:1148:6: warning: no previous prototype for 'qed_iov_clean_vf' [-Wmissing-prototypes]
drivers/net/ethernet/qlogic/qed/qed_sriov.c:2444:5: warning: no previous prototype for 'qed_iov_chk_ucast' [-Wmissing-prototypes]
drivers/net/ethernet/qlogic/qed/qed_sriov.c:2762:5: warning: no previous prototype for 'qed_iov_vf_flr_cleanup' [-Wmissing-prototypes]
....

In fact, these functions are only used in the file in which they are
declared and don't need a declaration, but can be made static.
so this patch marks these functions with 'static'.

Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-helper-cleanups'

Daniel Borkmann says:

====================
Some BPF helper cleanups

This series contains a couple of misc cleanups and improvements
for BPF helpers. For details please see individual patches. We
let this also sit for a few days with Fengguang's kbuild test
robot, and there were no issues seen (besides one false positive,
see last one for details).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add BPF_CALL_x macros for declaring helpers

This work adds BPF_CALL_<n>() macros and converts all the eBPF helper functions
to use them, in a similar fashion like we do with SYSCALL_DEFINE<n>() macros
that are used today. Motivation for this is to hide all the register handling
and all necessary casts from the user, so that it is done automatically in the
background when adding a BPF_CALL_<n>() call.

This makes current helpers easier to review, eases to write future helpers,
avoids getting the casting mess wrong, and allows for extending all helpers at
once (f.e. build time checks, etc). It also helps detecting more easily in
code reviews that unused registers are not instrumented in the code by accident,
breaking compatibility with existing programs.

BPF_CALL_<n>() internals are quite similar to SYSCALL_DEFINE<n>() ones with some
fundamental differences, for example, for generating the actual helper function
that carries all u64 regs, we need to fill unused regs, so that we always end up
with 5 u64 regs as an argument.

I reviewed several 0-5 generated BPF_CALL_<n>() variants of the .i results and
they look all as expected. No sparse issue spotted. We let this also sit for a
few days with Fengguang's kbuild test robot, and there were no issues seen. On
s390, it barked on the "uses dynamic stack allocation" notice, which is an old
one from bpf_perf_event_output{,_tp}() reappearing here due to the conversion
to the call wrapper, just telling that the perf raw record/frag sits on stack
(gcc with s390's -mwarn-dynamicstack), but that's all. Did various runtime tests
and they were fine as well. All eBPF helpers are now converted to use these
macros, getting rid of a good chunk of all the raw castings.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add own ctx rewriter on ifindex for clsact progs

When fetching ifindex, we don't need to test dev for being NULL since
we're always guaranteed to have a valid dev for clsact programs. Thus,
avoid this test in fast path.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add BPF_SIZEOF and BPF_FIELD_SIZEOF macros

Add BPF_SIZEOF() and BPF_FIELD_SIZEOF() macros to improve the code a bit
which otherwise often result in overly long bytes_to_bpf_size(sizeof())
and bytes_to_bpf_size(FIELD_SIZEOF()) lines. So place them into a macro
helper instead. Moreover, we currently have a BUILD_BUG_ON(BPF_FIELD_SIZEOF())
check in convert_bpf_extensions(), but we should rather make that generic
as well and add a BUILD_BUG_ON() test in all BPF_SIZEOF()/BPF_FIELD_SIZEOF()
users to detect any rewriter size issues at compile time. Note, there are
currently none, but we want to assert that it stays this way.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: minor cleanups in helpers

Some minor misc cleanups, f.e. use sizeof(__u32) instead of hardcoding
and in __bpf_skb_max_len(), I missed that we always have skb->dev valid
anyway, so we can drop the unneeded test for dev; also few more other
misc bits addressed here.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip_tunnel: do not clear l4 hashes

If skb has a valid l4 hash, there is no point clearing hash and force
a further flow dissection when a tunnel encapsulation is added.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: phy: mdio-xgene: Add hardware dependency

The mdio-xgene driver is only useful on X-Gene SoC.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Iyappan Subramanian <isubramanian@apm.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ATM-ForeRunnerHE: Use kmalloc_array() in he_init_group()

* Multiplications for the size determination of memory allocations
  indicated that array data structures should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of data types by pointer dereferences
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ATM-ENI: Use kmalloc_array() in eni_start()

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data structure by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'rxrpc-rewrite-20160908' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Rewrite data and ack handling

This patch set constitutes the main portion of the AF_RXRPC rewrite.  It
consists of five fix/helper patches:

(1) Fix ASSERTCMP's and ASSERTIFCMP's handling of signed values.

(2) Update some protocol definitions slightly.

(3) Use of an hlist for RCU purposes.

(4) Removal of per-call sk_buff accounting (not really needed when skbs
     aren't being queued on the main queue).

(5) Addition of a tracepoint to log incoming packets in the data_ready
     callback and to log the end of the data_ready callback.

And then there are two patches that form the main part:

(6) Preallocation of resources for incoming calls so that in patch (7) the
     data_ready handler can be made to fully instantiate an incoming call
     and make it live.  This extends through into AFS so that AFS can
     preallocate its own incoming call resources.

     The preallocation size is capped at the listen() backlog setting - and
     that is capped at a sysctl limit which can be set between 4 and 32.

     The preallocation is (re)charged either by accepting/rejecting pending
     calls or, in the case of AFS, manually.  If insufficient preallocation
     resources exist, a BUSY packet will be transmitted.

     The advantage of using this preallocation is that once a call is set
     up in the data_ready handler, DATA packets can be queued on it
     immediately rather than the DATA packets being queued for a background
     work item to do all the allocation and then try and sort out the DATA
     packets whilst other DATA packets may still be coming in and going
     either to the background thread or the new call.

(7) Rewrite the handling of DATA, ACK and ABORT packets.

     In the receive phase, DATA packets are now held in per-call circular
     buffers with deduplication, out of sequence detection and suchlike
     being done in data_ready.  Since there is only one producer and only
     once consumer, no locks need be used on the receive queue.

     Received ACK and ABORT packets are now parsed and discarded in
     data_ready to recycle resources as fast as possible.

     sk_buffs are no longer pulled, trimmed or cloned, but rather the
     offset and size of the content is tracked.  This particularly affects
     jumbo DATA packets which need insertion into the receive buffer in
     multiple places.  Annotations are kept to track which bit is which.

     Packets are no longer queued on the socket receive queue; rather,
     calls are queued.  Dummy packets to convey events therefore no longer
     need to be invented and metadata packets can be discarded as soon as
     parsed rather then being pushed onto the socket receive queue to
     indicate terminal events.

     The preallocation facility added in (6) is now used to set up incoming
     calls with very little locking required and no calls to the allocator
     in data_ready.

     Decryption and verification is now handled in recvmsg() rather than in
     a background thread.  This allows for the future possibility of
     decrypting directly into the user buffer.

     With this patch, the code is a lot simpler and most of the mass of
     call event and state wangling code in call_event.c is gone.

With this, the majority of the AF_RXRPC rewrite is complete.  However,
there are still things to be done, including:

(*) Limit the number of active service calls to prevent an attacker from
     filling up a server's memory.

(*) Limit the number of calls on the rebuff-with-BUSY queue.

(*) Transmit delayed/deferred ACKs from recvmsg() if possible, rather than
     punting to the background thread.  Ideally, the background thread
     shouldn't run at all, but data_ready can't call kernel_sendmsg() and
     we can't rely on recvmsg() attending to the call in a timely fashion.

(*) Prevent the call at the front of the socket queue from hogging
     recvmsg()'s attention if there's a sufficiently continuous supply of
     data.

(*) Distribute ICMP errors by connection rather than by call.  Possibly
     parse the ICMP packet to try and pin down the exact connection and
     call.

(*) Encrypt/decrypt directly between user buffers and socket buffers where
     possible.

(*) IPv6.

(*) Service ID upgrade.  This is a facility whereby a special flag bit is
     set in the DATA packet header when making a call that tells the server
     that it is allowed to change the service ID to an upgraded one and
     reply with an equivalent call from the upgraded service.

     This is used, for example, to override certain AFS calls so that IPv6
     addresses can be returned.

(*) Allow userspace to preallocate call user IDs for incoming calls.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-for-davem-2016-09-08' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.8

iwlwifi

* fix P2P dump trigger
* prevent a potential null dereference in iwlmvm
* prevent an uninitialized value from being returned in iwlmvm
* advertise support for channel width change in AP mode

ath10k

* fix racy rx status retrieval from htt context
* QCA9887 support is not experimental anymore, remove the warning message

ath9k

* fix regression with led GPIOs
* fix AR5416 GPIO access warning

brcmfmac

* avoid potential stack overflow in brcmf_cfg80211_start_ap()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dwc_eth_qos: do not register semi-initialized device

We move register_netdev() to the end of dwceqos_probe() to close any
races where the netdev callbacks are called before the initialization
has finished.

Reported-by: Pavel Andrianov <andrianov@ispras.ru>
Signed-off-by: Lars Persson <larper@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: identify chunks that need to be fragmented at IP level

Previously, without GSO, it was easy to identify it: if the chunk didn't
fit and there was no data chunk in the packet yet, we could fragment at
IP level. So if there was an auth chunk and we were bundling a big data
chunk, it would fragment regardless of the size of the auth chunk. This
also works for the context of PMTU reductions.

But with GSO, we cannot distinguish such PMTU events anymore, as the
packet is allowed to exceed PMTU.

So we need another check: to ensure that the chunk that we are adding,
actually fits the current PMTU. If it doesn't, trigger a flush and let
it be fragmented at IP level in the next round.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

via-velocity: remove null pointer check on array tdinfo->skb_dma

tdinfo->skb_dma is a 7 element array of dma_addr_t hence cannot be
null, so the pull pointer check on tdinfo->skb_dma is redundant.
Remove it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: mark qede_set_features() static

We get 1 warning when building kernel with W=1:
drivers/net/ethernet/qlogic/qede/qede_main.c:2113:5: warning: no previous prototype for 'qede_set_features' [-Wmissing-prototypes]

In fact, this function is only used in the file in which it is
declared and don't need a declaration, but can be made static.
so this patch marks this function with 'static'.

Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Fixed checkpatch errors for Microsemi PHYs.

The existing VSC85xx PHY driver did not follow the coding style and caused "checkpatch" to complain. This commit fixes this.

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: x25: remove null checks on arrays calling_ae and called_ae

dtefacs.calling_ae and called_ae are both 20 element __u8 arrays and
cannot be null and hence are redundant checks. Remove these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

libnvdimm: allow legacy (e820) pmem region to clear bad blocks

Bad blocks can be injected via /sys/block/pmemN/badblocks. In a situation
where legacy pmem is being used or a pmem region created by using memmap
kernel parameter, the injected bad blocks are not cleared due to
nvdimm_clear_poison() failing from lack of ndctl function pointer. In
this case we need to just return as handled and allow the bad blocks to
be cleared rather than fail.

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>