Yury Norov [Mon, 7 Dec 2015 05:00:32 +0000 (10:30 +0530)]
net: thunderx: nicvf_queues: nivc_*_intr: remove duplication
The same switch-case repeates for nivc_*_intr functions.
In this patch it is moved to a helper nicvf_int_type_to_mask().
By the way:
- Unneeded write to NICVF register dropped if int_type is unknown.
- netdev_dbg() is used instead of netdev_err().
Signed-off-by: Yury Norov <yury.norov@auriga.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Acked-by: Vadim Lomovtsev <Vadim.Lomovtsev@caiumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rainer Weikusat [Sun, 6 Dec 2015 21:11:38 +0000 (21:11 +0000)]
af_unix: fix unix_dgram_recvmsg entry locking
The current unix_dgram_recvsmg code acquires the u->readlock mutex in
order to protect access to the peek offset prior to calling
__skb_recv_datagram for actually receiving data. This implies that a
blocking reader will go to sleep with this mutex held if there's
presently no data to return to userspace. Two non-desirable side effects
of this are that a later non-blocking read call on the same socket will
block on the ->readlock mutex until the earlier blocking call releases it
(or the readers is interrupted) and that later blocking read calls
will wait longer than the effective socket read timeout says they
should: The timeout will only start 'ticking' once such a reader hits
the schedule_timeout in wait_for_more_packets (core.c) while the time it
already had to wait until it could acquire the mutex is unaccounted for.
The patch avoids both by using the __skb_try_recv_datagram and
__skb_wait_for_more packets functions created by the first patch to
implement a unix_dgram_recvmsg read loop which releases the readlock
mutex prior to going to sleep and reacquires it as needed
afterwards. Non-blocking readers will thus immediately return with
-EAGAIN if there's no data available regardless of any concurrent
blocking readers and all blocking readers will end up sleeping via
schedule_timeout, thus honouring the configured socket receive timeout.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rainer Weikusat [Sun, 6 Dec 2015 21:11:34 +0000 (21:11 +0000)]
core: enable more fine-grained datagram reception control
The __skb_recv_datagram routine in core/ datagram.c provides a general
skb reception factility supposed to be utilized by protocol modules
providing datagram sockets. It encompasses both the actual recvmsg code
and a surrounding 'sleep until data is available' loop. This is
inconvenient if a protocol module has to use additional locking in order
to maintain some per-socket state the generic datagram socket code is
unaware of (as the af_unix code does). The patch below moves the recvmsg
proper code into a new __skb_try_recv_datagram routine which doesn't
sleep and renames wait_for_more_packets to
__skb_wait_for_more_packets, both routines being exported interfaces. The
original __skb_recv_datagram routine is reimplemented on top of these
two functions such that its user-visible behaviour remains unchanged.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 7 Dec 2015 03:38:58 +0000 (04:38 +0100)]
PHY: DP83867: Remove looking in parent device for OF properties
Device tree properties for a phy device are expected to be in the phy
node. The current code for the DP83867 also tries to look in the
parent node. The devices binding documentation does not mention this,
no current device tree file makes use of this, and it is not behaviour
we want. So remove looking in the parent device.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Sun, 6 Dec 2015 21:47:15 +0000 (22:47 +0100)]
net: cdc_ncm: add "ndp_to_end" sysfs attribute
Adding a writable sysfs attribute for the "NDP to end"
quirk flag.
This makes it easier for end users to test new devices for
this firmware bug. We've been lucky so far, but we should
not depend on reporters capable of rebuilding the driver.
Cc: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Dec 2015 03:40:46 +0000 (22:40 -0500)]
Merge branch 'mlx4-HA-LAG-SRIOV-VF'
Or Gerlitz says:
====================
Add HA and LAG support for mlx4 SRIOV VFs
This series is built upon the code added in commit
ce388ff "Merge branch
'mlx4-next'" which added HA and LAG support to mlx4 RoCE and SRIOV services.
We add HA and Link Aggregation support to single ported mlx4 Ethernet VFs.
In this case, the PF Ethernet interfaces are bonded, the VFs see single
port HW devices (already supported) -- however, now this port is highly
available. This means that all VF HW QPs (both VF Ethernet driver and VF
RoCE / RAW QPs) are subject to the V2P (Virtual-To-Physical) mapping which
is managed by the PF driver, and hence resilient across link failures and
such events.
When bonding operates in Dynamic link aggregation (802.3ad) mode, traffic
from each VF will go over the VF "base port" (the port this VF is assigned
to by the admin) as long as this port is up. When the port fails, traffic
from all VFs that are defined on this port will move to the other port, and
be back to their base-port when it recovers.
Moni and Or.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:43 +0000 (18:07 +0200)]
net/mlx4_core: Support the HA mode for SRIOV VFs too
When the mlx4 driver runs in HA mode, and all VFs are single ported
ones, we make their single port Highly-Available.
This is done by taking advantage of the HA mode properties (following
bonding changes with programming the port V2P map, etc) and adding
the missing parts which are unique to SRIOV such as mirroring VF
steering rules on both ports.
Due to limits on the MAC and VLAN table this mode is enabled only when
number of total VFs is under 64.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Sun, 6 Dec 2015 16:07:42 +0000 (18:07 +0200)]
IB/mlx4: Use the VF base-port when demuxing mad from wire
Under HA mode, it's possible that the VF registered its GID
(and expects to get mads through the PV scheme) on a port which is
different from the one this mad arrived on, due to HA fail over.
Therefore, if the gid is not matched on the port that the packet arrived
on, check for a match on the other port if HA mode is active -- and if a
match is found on the other port, continue processing the mad using that
other port.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:41 +0000 (18:07 +0200)]
net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA mode
Due to HW limitations, indexes to MAC and VLAN tables are always taken
from the table of the actual port. So, if a resource holds an index to
a table, it may refer to different values during the lifetime of the
resource, unless the tables are mirrored. Also, even when
driver is not in HA mode the policy of allocating an index to these
tables is such to make sure, as much as possible, that when the time
comes the mirroring will be successful. This means that in multifunction
mode the allocation of a free index in a port's table tries to make sure
that the same index in the other's port table is also free.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:40 +0000 (18:07 +0200)]
net/mlx4_core: Support mirroring VF DMFS rules on both ports
Under HA mode, steering rules set by VFs should be mirrored on both
ports of the device so packets will be accepted no matter on which
port they arrived.
Since getting into HA mode is done dynamically when the user bonds mlx4
Ethernet netdevs, we keep hold of the VF DMFS rule mbox with the port
value flipped (1->2,2->1) and execute the mirroring when getting into
HA mode. Later, when going out of HA mode, we unset the mirrored rules.
In that context note that mirrored rules cannot be removed explicitly.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:39 +0000 (18:07 +0200)]
net/mlx4_core: Use both physical ports to dispatch link state events to VF
Under HA mode, the link down event should be sent to VFs only if both
ports are down.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Sun, 6 Dec 2015 16:07:38 +0000 (18:07 +0200)]
net/mlx4_core: Use both physical ports to set the VF link state
In HA mode, the link state for VFs for which the policy is "auto"
(i.e. follow the physical link state) should be ORed from both ports.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sun, 6 Dec 2015 05:56:23 +0000 (06:56 +0100)]
VSOCK: fix returnvar.cocci warnings
Remove unneeded variable used to store return value.
Generated by: scripts/coccinelle/misc/returnvar.cocci
CC: Asias He <asias@redhat.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Sun, 6 Dec 2015 20:25:50 +0000 (21:25 +0100)]
net: qmi_wwan: should hold RTNL while changing netdev type
The notifier calls were thrown in as a last-minute fix for an
imagined "this device could be part of a bridge" problem. That
revealed a certain lack of locking. Not to mention testing...
Avoid this splat:
RTNL: assertion failed at net/core/dev.c (1639)
CPU: 0 PID: 4293 Comm: bash Not tainted 4.4.0-rc3+ #358
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
0000000000000000 ffff8800ad253d60 ffffffff8122f7cf ffff8800ad253d98
ffff8800ad253d88 ffffffff813833ab 0000000000000002 ffff880230f48560
ffff880230a12900 ffff8800ad253da0 ffffffff813833da 0000000000000002
Call Trace:
[<
ffffffff8122f7cf>] dump_stack+0x4b/0x63
[<
ffffffff813833ab>] call_netdevice_notifiers_info+0x3d/0x59
[<
ffffffff813833da>] call_netdevice_notifiers+0x13/0x15
[<
ffffffffa09be227>] raw_ip_store+0x81/0x193 [qmi_wwan]
[<
ffffffff8131e149>] dev_attr_store+0x20/0x22
[<
ffffffff811d858b>] sysfs_kf_write+0x49/0x50
[<
ffffffff811d8027>] kernfs_fop_write+0x10a/0x151
[<
ffffffff8117249a>] __vfs_write+0x26/0xa5
[<
ffffffff81085ed4>] ? percpu_down_read+0x53/0x7f
[<
ffffffff81174c9e>] ? __sb_start_write+0x5f/0xb0
[<
ffffffff81174c9e>] ? __sb_start_write+0x5f/0xb0
[<
ffffffff81172c37>] vfs_write+0xa3/0xe7
[<
ffffffff811734ad>] SyS_write+0x50/0x7e
[<
ffffffff8145c517>] entry_SYSCALL_64_fastpath+0x12/0x6f
Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Singhai, Anjali [Fri, 4 Dec 2015 07:49:31 +0000 (23:49 -0800)]
Revert "i40e: remove CONFIG_I40E_VXLAN"
This reverts commit
8fe269991aece394a7ed274f525d96c73f94109a.
The case where VXLAN is a module and i40e driver is inbuilt
will not be handled properly with this change since i40e
will have an undefined symbol vxlan_get_rx_port in it.
v2: Add a signed-off-by.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 6 Dec 2015 16:14:06 +0000 (11:14 -0500)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2015-12-05
This series contains updates to fm10k only.
Jacob provides the remaining fm10k patches in the series. First change
ensures that all the logic regarding the setting of netdev features is
consolidated in one place of the driver. Fixed an issue where an assumption
was being made on how many queues are available, especially when init_hw_vf()
errors out. Fixed up an number of issues with init_hw() where failures
were not being handled properly or at all, so update the driver to check
returned error codes and respond appropriately. Fixed up typecasting
issues found where either the incorrect typecast size was used or
explicitly typecast values. Added additional debugging statistics and
rename statistic to better reflect its true value. Added support for
ITR scaling based on PCIe link speed for fm10k. Fixed up code comment
where "hardware" was misspelled.
v2: Dropped patches #1 and #10 from original submission, patch #1 was from
Nick Krause and due to his past kernel interactions, dropping his patch.
Patch #10 had questions and concerns from Tom Herbert which cannot be
addressed at this time since the author (Jacob Keller) is currently on
sabbatical, so dropping this patch for now until we can properly address
Tom's questions and concerns.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Fri, 16 Oct 2015 17:57:11 +0000 (10:57 -0700)]
fm10k: TRIVIAL cleanup order at top of fm10k_xmit_frame
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:10 +0000 (10:57 -0700)]
fm10k: TRIVIAL fix typo of hardware
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:09 +0000 (10:57 -0700)]
fm10k: change default Tx ITR to 25usec
The current default ITR for Tx is overly restrictive. Using a simple
netperf TCP_STREAM test, we top out at about 10Gb/s for a single thread
when running using 1500 byte frames. By reducing the ITR value to 25usec
(up to 40K interrupts a second from 10K), we are able to achieve 36Gb/s
for a single thread TCP stream test.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:08 +0000 (10:57 -0700)]
fm10k: use macro for default Tx and Rx ITR values
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:07 +0000 (10:57 -0700)]
fm10k: Update adaptive ITR algorithm
The existing adaptive ITR algorithm is overly restrictive. It throttles
incorrectly for various traffic rates, and does not produce good
performance. The algorithm now allows for more interrupts per second,
and does some calculation to help improve for smaller packet loads. In
addition, take into account the new itr_scale from the hardware which
indicates how much to scale due to PCIe link speed.
Reported-by: Matthew Vick <matthew.vick@intel.com>
Reported-by: Alex Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:06 +0000 (10:57 -0700)]
fm10k: introduce ITR_IS_ADAPTIVE macro
Define a macro for identifying when the itr value is dynamic or
adaptive. The concept was taken from i40e. This helps make clear what
the check is, and reduces the line length to something more reasonable
in a few places.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:05 +0000 (10:57 -0700)]
fm10k: Add support for ITR scaling based on PCIe link speed
The Intel Ethernet Switch FM10000 Host Interface interrupt throttle
timers are based on the PCIe link speed. Because of this, the value
being programmed into the ITR registers must be scaled accordingly.
For the PF, this is as simple as reading the PCIe link speed and storing
the result. However, in the case of SR-IOV, the VF's interrupt throttle
timers are based on the link speed of the PF. However, the VF is unable
to get the link speed information from its configuration space, so the
PF must inform it of what scale to use.
Rather than pass this scale via mailbox message, take advantage of
unused bits in the TDLEN register to pass the scale. It is the
responsibility of the PF to program this for the VF while setting up the
VF queues and the responsibility of the VF to get the information
accordingly. This is preferable because it allows the VF to set up the
interrupts properly during initialization and matches how the MAC
address is passed in the TDBAL/TDBAH registers.
Since we're modifying fm10k_type.h, we may as well also update the
copyright year.
Reported-by: Matthew Vick <matthew.vick@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:03 +0000 (10:57 -0700)]
fm10k: rename mbx_tx_oversized statistic to mbx_tx_dropped
Originally this statistic was renamed because the method of dropping was
called "drop_oversized_messages", but this logic has changed much, and
this counter does actually represent messages which we failed to
transmit for a number of reasons. Rename the counter back to tx_dropped
since this is when it will increment, and it is less confusing.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:02 +0000 (10:57 -0700)]
fm10k: add statistics for actual DWORD count of mbmem mailbox
A previous bug was uncovered by addition of a debug stat to indicate the
actual number of DWORDS we pulled from the mbmem. It turned out this was
not the same as the tx_dwords counter. While the previous bug fix should
have corrected this in all cases, add some debug stats that count the
number of DWORDs pushed or pulled from the mbmem. A future debugger may
take advantage of this statistic for debugging purposes. Since we're
modifying fm10k_mbx.h, update the copyright year as well.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:01 +0000 (10:57 -0700)]
fm10k: explicitly typecast vlan values to u16
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:00 +0000 (10:57 -0700)]
fm10k: Correct typecast in fm10k_update_xc_addr_pf
Since the resultant data type of the mac_update.mac_upper field is u16,
it does not make sense to typecast u8 variables to u32 first. Since
we're modifying fm10k_pf.c, also update the copyright year.
Reported-by: Matthew Vick <matthew.vick@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:59 +0000 (10:56 -0700)]
fm10k: reinitialize queuing scheme after calling init_hw
The init_hw function may fail, and in the case of VFs, it might change
the number of maximum queues available. Thus, for every flow which
checks init_hw, we need to ensure that we clear the queue scheme before,
and initialize it after. The fm10k_io_slot_reset path will end up
triggering a reset so fm10k_reinit needs this change. The
fm10k_io_error_detected and fm10k_io_resume also need to properly clear
and reinitialize the queue scheme.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:58 +0000 (10:56 -0700)]
fm10k: always check init_hw for errors
A recent change modified init_hw in some flows the function may fail on
VF devices. For example, if a VF doesn't yet own its own queues.
However, many callers of init_hw didn't bother to check the error code.
Other callers checked but only displayed diagnostic messages without
actually handling the consequences.
Fix this by (a) always returning and preventing the netdevice from going
up, and (b) printing the diagnostic in every flow for consistency. This
should resolve an issue where VF drivers would attempt to come up
before the PF has finished assigning queues.
In addition, change the dmesg output to explicitly show the actual
function that failed, instead of combining reset_hw and init_hw into a
single check, to help for future debugging.
Fixes: 1d568b0f6424 ("fm10k: do not assume VF always has 1 queue")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:57 +0000 (10:56 -0700)]
fm10k: reset max_queues on init_hw_vf failure
VF drivers must detect how many queues are available. Previously, the
driver assumed that each VF has at minimum 1 queue. This assumption is
incorrect, since it is possible that the PF has not yet assigned the
queues to the VF by the time the VF checks. To resolve this, we added a
check first to ensure that the first queue is infact owned by the VF at
init_hw_vf time. However, the code flow did not reset hw->mac.max_queues
to 0. In some cases, such as during reinit flows, we call init_hw_vf
without clearing the previous value of hw->mac.max_queues. Due to this,
when init_hw_vf errors out, if its error code is not properly handled
the VF driver may still believe it has queues which no longer belong to
it. Fix this by clearing the hw->mac.max_queues on exit due to errors.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:56 +0000 (10:56 -0700)]
fm10k: set netdev features in one location
Don't change netdev hw_features later in fm10k_probe, instead set all
values inside fm10k_alloc_netdev. To do so, we need to know the MAC type
(whether it is PF or VF) in order to determine what to do. This helps
ensure that all logic regarding features is co-located.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Sun, 6 Dec 2015 03:45:56 +0000 (22:45 -0500)]
Merge branch 'renesas-read-mac'
Sergei Shtylyov says:
====================
Renesas: read MAC address registers only once
Here's 2 patches against DaveM's 'net-next.git' repo. Here we optimize
the MAC address register reading (left over from a bootloader).
[1/2] ravb: read MAC address registers only once
[2/2] sh_eth: read MAC address registers only once
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 4 Dec 2015 21:58:57 +0000 (00:58 +0300)]
sh_eth: read MAC address registers only once
The code reading the MAHR/MALR registers in read_mac_address() is terribly
ineffective -- it reads MAHR 4 times and MALR 2 times, while it's enough to
read each register only once. Use the local variables to achieve that,
somewhat beautifying the code while at it...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 4 Dec 2015 21:58:07 +0000 (00:58 +0300)]
ravb: read MAC address registers only once
The code reading the MAHR/MALR registers in ravb_read_mac_address() is
terribly ineffective -- it reads MAHR 4 times and MALR 2 times, while
it's enough to read each register only once. Use the local variables to
achieve that, somewhat beautifying the code while at it...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 6 Dec 2015 00:00:03 +0000 (19:00 -0500)]
Merge branch 'bnx2x'
Michal Schmidt says:
====================
bnx2x: fewer error messages, simplification
This removes one redundant error message in bnx2x and changes another one to
WARN_ONCE. The third patch is a small simplification in ethtool stats.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Fri, 4 Dec 2015 16:22:36 +0000 (17:22 +0100)]
bnx2x: simplify distinction between port and func stats
The 'flags' field in bnx2x_stats_arr[] serves only one purpose - to tell
us if the statistic is a per-port stat and thus should not be shown for
virtual functions. It's strange that the field can have three different
values. A boolean will do just fine.
Also remove IS_FUNC_STAT(). It was used only once and it's in fact just
a negation of IS_PORT_STAT().
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Fri, 4 Dec 2015 16:22:35 +0000 (17:22 +0100)]
bnx2x: change FW GRO error message to WARN_ONCE
It's supposed to be impossible for TPA to give us anything else
than IPv4 or IPv6 here. But in case there is a way to reach this error
by some strange received frames, we don't want to flood the kernel log.
WARN_ONCE is better for this purpose.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Fri, 4 Dec 2015 16:22:34 +0000 (17:22 +0100)]
bnx2x: drop redundant error message about allocation failure
alloc_pages() already prints a warning when it fails. No need to emit
another message. Certainly not at KERN_ERR level, because it is no big
deal if this GFP_ATOMIC allocation fails occasionally.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 4 Dec 2015 14:01:31 +0000 (15:01 +0100)]
net: constify netif_is_* helpers net_device param
As suggested by Eric, these helpers should have const dev param.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Pieczko [Fri, 4 Dec 2015 08:48:39 +0000 (08:48 +0000)]
sfc: check warm_boot_count after other functions have been reset
A change in MCFW behaviour means that the net driver must update its record
of the warm_boot_count by reading it from the ER_DZ_BIU_MC_SFT_STATUS
register.
On v4.6.x MCFW the global boot count was incremented when some functions
needed to be reset to enable multicast chaining, so all functions saw the
same value. In that case, the driver needed to increment its
warm_boot_count when other functions were reset, to avoid noticing it later
and then trying to reset itself to recover unnecessarily.
With v4.7+ MCFW, the boot count in firmware doesn't change as that is
unnecessary since the PFs that have been reset will each receive an MC
reboot notification. In that case, the driver re-reads the unchanged
value.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Fri, 4 Dec 2015 07:43:19 +0000 (08:43 +0100)]
atm: solos-pci: Replace simple_strtol by kstrtoint
The simple_strtol function is obsolete.
This patch replace it by kstrtoint.
This will simplify code, since some error case not handled by
simple_strtol are handled by kstrtoint.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 5 Dec 2015 22:41:42 +0000 (17:41 -0500)]
Merge branch 'batman-hdlc'
Andrew Lunn says:
====================
Allow BATMAN to use hdlc-eth interfaces
BATMAN works over Ethernet like interfaces. hdlc-eth provides the need
requirements. However, hdlc devices are often created as raw hdlc
devices, which batman cannot use, and are then be transmuted into
other types using sethdlc(1). Have the HDLC code emit
NETDEV_*_TYPE_CHANGE events when the type changes, and have BATMAN
react on these events.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 3 Dec 2015 20:12:33 +0000 (21:12 +0100)]
batman-adv: Act on NETDEV_*_TYPE_CHANGE events
A network interface can change type. It may change from a type which
batman does not support, e.g. hdlc, to one it does, e.g. hdlc-eth.
When an interface changes type, it sends two notifications. Handle
these notifications.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 3 Dec 2015 20:12:32 +0000 (21:12 +0100)]
ipv6: Only act upon NETDEV_*_TYPE_CHANGE if we have ipv6 addresses
An interface changing type may not have IPv6 addresses. Don't
call the address configuration type change in this case.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 3 Dec 2015 20:12:31 +0000 (21:12 +0100)]
WAN: HDLC: Call notifiers before and after changing device type
An HDLC device can change type when the protocol driver is changed.
Calling the notifier change allows potential users of the interface
know about this planned change, and even block it. After the change
has occurred, send a second notification to users can evaluate the new
device type etc.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 3 Dec 2015 20:12:30 +0000 (21:12 +0100)]
WAN: HDLC: Detach protocol before unregistering device
The current code first unregisters the device, and then detaches the
protocol from it. This should be performed the other way around, since
the detach may try to use state which has been freed by the
unregister. Swap the order, so that we first detach and then remove the
netdev.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Dec 2015 21:56:23 +0000 (16:56 -0500)]
Merge branch 'qmi_wwan_MDM9x30'
Bjørn Mork says:
====================
net: qmi_wwan: MDM9x30 support
We add new device IDs all the time, often without any testing on
actual hardware. This is usually OK as long as the device is similar
to already supported devices, using the same chipset and firmware
basis. But the Sierra Wireless MC7455 is an example of a new chipset
generation. Adding it based on assumed similarity with its ancestors
proved too optimistic.
This series adds the missing bits and pieces necessary to support LTE
Advanced modems based on the Qualcomm MDM9x30 chipset. A big thanks to
Sierra Wireless for providing MC7455 samples for testing
The most important change is the "raw-ip" support. The series also
adds a necessary control request, removes an unsupported device ID,
and adds a driver specific entry in MAINTAINERS.
A few random notes about "raw-ip":
"I rather have these all running in raw IP mode. The 802.3 framing is
utterly stupid." - Marcel Holtmann in Jan 2012 [1]
Marcel was right. I should have listened to him. What more can I say?
The 802.3 framing has provided a steady supply of firmware bugs for
many years. We've added driver workarounds for many of these, but
there are still known bugs where the workaround is so yucky that we
have refused to apply it. But all that is over now. The latest
generation Qualcomm chips no longer supports 802.3 framing at all.
I had two open questions regarding the "raw-ip" userspace API:
1) Should we continue faking an ethernet device, even if we don't use
the L2 headers on the USB link anymore?
There was a vote in favour of the "headerless" device. This is the
honest representation of the hardware/firmware interface.
2) What input should the driver base its framing on?
Snooping or directly manipulating QMI is considered out of the
question. We delegated all QMI handling to userspace from the
beginning.
We have so far required userspace to configure the firmware for
"802.3" framing, or fail if that proved impossible. This
requirement is now changed. Userspace must now inform the driver
if it negotiates "raw-ip" framing. Two alternative interfaces were
proposed:
- ethtool private driver flag, or
- sysfs file
The NetworkManager/ModemManager developers were in favour of the
sysfs alternative.
These questions (or any other you migh have :) are of course still
open. This patch set presents the solutions I currently prefer,
considering the above.
All comments are appreciated, even simple '+1' ones.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:23 +0000 (19:24 +0100)]
MAINTAINERS: add qmi_wwan driver entry
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:22 +0000 (19:24 +0100)]
net: qmi_wwan: document the qmi/raw_ip sysfs file
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:21 +0000 (19:24 +0100)]
net: qmi_wwan: support "raw IP" mode
QMI wwan devices have traditionally emulated ethernet devices
by default. But they have always had the capability of operating
without any L2 header at all, transmitting and receiving "raw"
IP packets over the USB link. This firmware feature used to be
configurable through the QMI management protocol.
Traditionally there was no way to verify the firmware mode
without attempting to change it. And the firmware would often
disallow changes anyway, i.e. due to a session already being
established. In some cases, this could be a hidden firmware
internal session, completely outside host control. For these
reasons, sticking with the "well known" default mode was safest.
But newer generations of QMI hardware and firmware have moved
towards defaulting to "raw IP" mode instead, followed by an
increasing number of bugs in the already buggy "802.3" firmware
implementation. At the same time, the QMI management protocol
gained the ability to detect the current mode. This has enabled
the userspace QMI management application to verify the current
firmware mode without trying to modify it.
Following this development, the latest QMI hardware and firmware
(the MDM9x30 generation) has dropped support for "802.3" mode
entirely. Support for "raw IP" framing in the driver is therefore
necessary for these devices, and to a certain degree to work
around problems with the previous generation,
This patch adds support for "raw IP" framing for QMI devices,
changing the netdev from an ethernet device to an ARPHRD_NONE
p-t-p device when "raw IP" framing is enabled.
The firmware setup is fully delegated to the QMI userspace
management application, through simple tunneling of the QMI
protocol. The driver will therefore not know which mode has been
"negotiated" between firmware and userspace. Allowing userspace
to inform the driver of the result through a sysfs switch is
considered a better alternative than to change the well established
clean delegation of firmware management to userspace.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:20 +0000 (19:24 +0100)]
usbnet: allow mini-drivers to consume L2 headers
Assume the minidriver has taken care of all L2 header parsing
if it sets skb->protocol. This allows the minidriver to
support non-ethernet L2 headers, and even operate without
any L2 header at all.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:19 +0000 (19:24 +0100)]
net: qmi_wwan: remove 1199:9070 device id
This turned out to be a bootloader device ID. No need for
that in this driver. It will only provide a single serial
function.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:18 +0000 (19:24 +0100)]
net: qmi_wwan: MDM9x30 specific power management
MDM9x30 based modems appear to go into a deeper sleep when
suspended without "Remote Wakeup" enabled. The QMI interface
will not respond unless a "set DTR" control request is sent
on resume. The effect is similar to a QMI_CTL SYNC request,
resetting (some of) the firmware state.
We allow userspace sessions to span multiple character device
open/close sequences. This means that userspace can depend
on firmware state while both the netdev and the character
device are closed. We have disabled "needs_remote_wakeup" at
this point to allow devices without remote wakeup support to
be auto-suspended.
To make sure the MDM9x30 keeps firmware state, we need to
keep "needs_remote_wakeup" always set. We also need to
issue a "set DTR" request to enable the QMI interface.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Dec 2015 19:36:16 +0000 (14:36 -0500)]
Merge branch 'hip06-soc'
Salil Mehta says:
====================
net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem
This PATCH V7 addresses the TAB formatting comments by
Sergei Shtylyov. Missing TABs at some other palces have
also been corrected.
PATCH V6:
This addresses the review comments provided by
David Miller over the existing use of ENABLE/DISABLE
hash defines with the code. These hash defines are doing
a similar job as implicit type bool would do. So these are
kind of duplicate and are redundant.
PATCH V5:
This PATCH addresses the review comments by Yuval Mintz
<Yuval.Mintz@qlogic.com>. This rework of comments are basically
related to:
1) styling of the code,
2) RSS default Key initiailization related code
3) redundant code removal
PATCH V4:
This addresses the review comment provided by
Sergei Shtylyov. The changelog of every patch has also
been modified.
PATCH V3:
Addresses the review comment floated by David Miller
PATCH V2:
1) Bug Fixes and Clean-up: Internally identified
2) Addresses internal review comments by Kenneth Lee and
by Huang Daode
3) Addresses the review comment from "Yisen.Zhuang(Zhuangyuzeng)"
4) Adds fix from Fengguang Wu for an error generated from
"kbuild test robot" from Intel
5) Ethtool support for TSO set option from Lisheng
PATCH V1:
Adds initial support of Hip06 SoC with below changes:
This patch-set adds support of new Hisilicon Hip06 SoC to the existing
(already part of net-next) HNS ethernet driver for Hip05 SoC. Hip06 is
a multi-core SoC and is a derivative of Hip05 SoC with lots of new
hardware featres supported like RSS, TSO, hardware VLAN assist etc.
The changes in the driver are mainly due to following:
1) changes in the DMA descriptor provided by the Hip06 ethernet
hardware. These changes need to co-exist with already present
Hip05 DMA descriptor and its operating functions. The decision
to choose the correct type of DMA descriptor is taken dynamically
depending upon the version of the hardware (i.e. V1/hip05 or
V2/hip06, see already existing hisilicon-hns-nic.txt binding file
for the detailed description version and naming).
2) To support new features added to the Hip06 ethernet hardware:
a. RSS (Receive Side Scaling)
b. TSO (TCP Segment Offload)
c. Hardware VLAN support (currently we are initializing hardware
to not assist in stripping the vlan tag at hardware level.
Proper support of this feature and ethtool would come after
these patches have been accepted)
Kindly note that, this patchset has been based on latest net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:57 +0000 (12:17 +0000)]
net:hns: Add the init code to disable Hip06 "Hardware VLAN assist"
This patch adds the initializzation code to disable the hardware
vlan support for VLAN Tag stripping by default for now.
Proper support of "hardware VLAN assitance" feature would
soon come in the next coming patches.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:56 +0000 (12:17 +0000)]
net:hns: Add support of ethtool TSO set option for Hip06 in HNS
This patch adds the support of ethtool TSO option to support
Hip06 SoC to HNS
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lisheng <lisheng011@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:55 +0000 (12:17 +0000)]
net:hns: Add Hip06 "TSO(TCP Segment Offload)" support HNS Driver
This patch adds the support of "TSO (TCP Segment Offload)" feature
provided by the Hip06 ethernet hardware to the HNS ethernet
driver.
Enabling this feature would help offload the TCP Segmentation
process to the Hip06 ethernet hardware. This eventually would help
in saving precious cpu cycles.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lisheng <lisheng011@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:54 +0000 (12:17 +0000)]
net:hns: Add Hip06 "RSS(Receive Side Scaling)" support to HNS Driver
This patch adds the support of "RSS (Receive Side Scaling)" feature
provided by the Hip06 ethernet hardware to the HNS ethernet
driver.
This feature helps in distributing the different flows (mapped as
hash by hardware using Toeplitz Hash) to different Queues asssociated
with the processor cores. The mapping of flow-hash values to the
different queues is stored in indirection table (which is per Packet-
parse-Engine/PPE). This patch also provides the changes to re-program
the (flow-hash<->Qid) mapping using the ethtool.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Kenneth Lee <liguozhu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:53 +0000 (12:17 +0000)]
net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem
This patchset adds support of Hisilicon Hip06 SoC to the existing HNS
ethernet driver.
The changes in the driver are mainly due to changes in the DMA
descriptor provided by the Hip06 ethernet hardware. These changes
need to co-exist with already present Hip05 DMA descriptor and its
operating functions. The decision to choose the correct type of DMA
descriptor is taken dynamically depending upon the version of the
hardware (i.e. V1/hip05 or V2/hip06, see already existing
hisilicon-hns-nic.txt binding file for detailed description). other
changes includes in SBM, DSAF and PPE modules as well. Changes
affecting the driver related to the newly added ethernet hardware
features in Hip06 would be added as separate patch over this and
subsequent patches.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: yankejian <yankejian@huawei.com>
Signed-off-by: huangdaode <huangdaode@hisilicon.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: lisheng <lisheng011@huawei.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
yzhu1 [Thu, 3 Dec 2015 10:00:55 +0000 (18:00 +0800)]
net: bonding: remove redudant brackets
It is not necessary to use two brackets. As such, the redudant brackets
are removed.
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Dec 2015 02:03:21 +0000 (21:03 -0500)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
drivers/net/ethernet/renesas/ravb_main.c
kernel/bpf/syscall.c
net/ipv4/ipmr.c
All three conflicts were cases of overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 4 Dec 2015 00:02:46 +0000 (16:02 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"A lot of Thanksgiving turkey leftovers accumulated, here goes:
1) Fix bluetooth l2cap_chan object leak, from Johan Hedberg.
2) IDs for some new iwlwifi chips, from Oren Givon.
3) Fix rtlwifi lockups on boot, from Larry Finger.
4) Fix memory leak in fm10k, from Stephen Hemminger.
5) We have a route leak in the ipv6 tunnel infrastructure, fix from
Paolo Abeni.
6) Fix buffer pointer handling in arm64 bpf JIT,f rom Zi Shen Lim.
7) Wrong lockdep annotations in tcp md5 support, fix from Eric
Dumazet.
8) Work around some middle boxes which prevent proper handling of TCP
Fast Open, from Yuchung Cheng.
9) TCP repair can do huge kmalloc() requests, build paged SKBs
instead. From Eric Dumazet.
10) Fix msg_controllen overflow in scm_detach_fds, from Daniel
Borkmann.
11) Fix device leaks on ipmr table destruction in ipv4 and ipv6, from
Nikolay Aleksandrov.
12) Fix use after free in epoll with AF_UNIX sockets, from Rainer
Weikusat.
13) Fix double free in VRF code, from Nikolay Aleksandrov.
14) Fix skb leaks on socket receive queue in tipc, from Ying Xue.
15) Fix ifup/ifdown crach in xgene driver, from Iyappan Subramanian.
16) Fix clearing of persistent array maps in bpf, from Daniel
Borkmann.
17) In TCP, for the cross-SYN case, we don't initialize tp->copied_seq
early enough. From Eric Dumazet.
18) Fix out of bounds accesses in bpf array implementation when
updating elements, from Daniel Borkmann.
19) Fill gaps in RCU protection of np->opt in ipv6 stack, from Eric
Dumazet.
20) When dumping proxy neigh entries, we have to accomodate NULL
device pointers properly, from Konstantin Khlebnikov.
21) SCTP doesn't release all ipv6 socket resources properly, fix from
Eric Dumazet.
22) Prevent underflows of sch->q.qlen for multiqueue packet
schedulers, also from Eric Dumazet.
23) Fix MAC and unicast list handling in bnxt_en driver, from Jeffrey
Huang and Michael Chan.
24) Don't actively scan radar channels, from Antonio Quartulli"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (110 commits)
net: phy: reset only targeted phy
bnxt_en: Setup uc_list mac filters after resetting the chip.
bnxt_en: enforce proper storing of MAC address
bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
net: lpc_eth: remove irq > NR_IRQS check from probe()
net_sched: fix qdisc_tree_decrease_qlen() races
openvswitch: fix hangup on vxlan/gre/geneve device deletion
ipv4: igmp: Allow removing groups from a removed interface
ipv6: sctp: implement sctp_v6_destroy_sock()
arm64: bpf: add 'store immediate' instruction
ipv6: kill sk_dst_lock
ipv6: sctp: add rcu protection around np->opt
net/neighbour: fix crash at dumping device-agnostic proxy entries
sctp: use GFP_USER for user-controlled kmalloc
sctp: convert sack_needed and sack_generation to bits
ipv6: add complete rcu protection around np->opt
bpf: fix allocation warnings in bpf maps and integer overflow
mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0
net: mvneta: enable setting custom TX IP checksum limit
net: mvneta: fix error path for building skb
...
Linus Torvalds [Thu, 3 Dec 2015 23:45:16 +0000 (15:45 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes from this series. The most important here is a
regression fix for an issue that some folks would hit in blk-merge.c,
and the NVMe queue depth limit for the screwed up Apple "nvme"
controller.
In more detail, this pull request contains:
- a set of fixes for null_blk, including a fix for a few corner cases
where we could hang the device. From Arianna and Paolo.
- lightnvm:
- A build improvement from Keith.
- Update the qemu pci id detection from Matias.
- Error handling fixes for leaks and other little fixes from
Sudip and Wenwei.
- fix from Eric where BLKRRPART would not return EBUSY for whole
device mounts, only when partitions were mounted.
- fix from Jan Kara, where EOF O_DIRECT reads would return
negatively.
- remove check for rq_mergeable() when checking limits for cloned
requests. The check doesn't make any sense. It's assuming that
since NOMERGE is set on the request that we don't have to
recalculate limits since the request didn't change, but that's not
true if the request has been redirected. From Hannes.
- correctly get the bio front segment value set for single segment
bio's, fixing a BUG() in blk-merge. From Ming"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: temporary fix for Apple controller reset
null_blk: change type of completion_nsec to unsigned long
null_blk: guarantee device restart in all irq modes
null_blk: set a separate timer for each command
blk-merge: fix computing bio->bi_seg_front_size in case of single segment
direct-io: Fix negative return from dio read beyond eof
block: Always check queue limits for cloned requests
lightnvm: missing nvm_lock acquire
lightnvm: unconverted ppa returned in get_bb_tbl
lightnvm: refactor and change vendor id for qemu
lightnvm: do device max sectors boundary check first
lightnvm: fix ioctl memory leaks
lightnvm: free memory when gennvm register fails
lightnvm: Simplify config when disabled
Return EBUSY from BLKRRPART for mounted whole-dev fs
Linus Torvalds [Thu, 3 Dec 2015 23:23:17 +0000 (15:23 -0800)]
Merge tag 'trace-v4.4-rc3' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"During the merge window I added a new file that is used to filter
trace events on pids. It filters all events where only tasks with
their pid in that file exists. It also handles the sched_switch and
sched_wakeup trace events where the current task does not have its pid
in the file, but the task either being switched to or awaken does.
Unfortunately, I forgot about sched_wakeup_new and sched_waking. Both
of these tracepoints use the same class as the sched_wakeup
tracepoint, and they too should be included in what gets filtered by
the set_event_pid file"
* tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filter
David S. Miller [Thu, 3 Dec 2015 20:56:22 +0000 (15:56 -0500)]
Merge tag 'mac80211-for-davem-2015-12-02' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A small set of fixes for 4.4:
* fix scanning in mac80211 to not actively scan radar
channels (from Antonio)
* fix uninitialized variable in remain-on-channel that
could lead to treating frame TX as remain-on-channel
and not sending the frame at all
* remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
was broken and needs more work, we'll enable it later
* fix call_rcu() induced use-after-reset/free in mesh
(that was suddenly causing issues in certain tests)
* always request block-ack window size 64 as we found
some APs will otherwise crash (really ...)
* fix P2P-Device teardown sequence to avoid restarting
with uninitialized data
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 3 Dec 2015 09:12:03 +0000 (10:12 +0100)]
mlxsw: core: Change BUG to WARN in hwmon code
Better to just warn the user that something really odd is going on and
continue to run.
Suggested-by: Or Gerlitz <gerlitz.or@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jérôme Pouiller [Thu, 3 Dec 2015 09:02:35 +0000 (10:02 +0100)]
net: phy: reset only targeted phy
It is possible to address another chip on same MDIO bus. The case is
correctly handled for media advertising. It is taken into account
only if mii_data->phy_id == phydev->addr. However, this condition
was missing for reset case.
Signed-off-by: Jérôme Pouiller <jezz@sysmic.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Boyd [Thu, 3 Dec 2015 07:55:15 +0000 (23:55 -0800)]
stmmac: ipq806x: Return error values instead of pointers
Typically we return error pointers when we want to use those
pointers in the non-error case, but this function is just
returning error pointers or NULL for success. Change the style to
plain int to follow normal kernel coding styles.
Cc: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Wed, 2 Dec 2015 20:19:37 +0000 (15:19 -0500)]
tipc: fix node reference count bug
Commit
5405ff6e15f40f2f ("tipc: convert node lock to rwlock")
introduced a bug to the node reference counter handling. When a
message is successfully sent in the function tipc_node_xmit(),
we return directly after releasing the node lock, instead of
continuing and decrementing the node reference counter as we
should do.
This commit fixes this bug.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Dec 2015 20:18:10 +0000 (15:18 -0500)]
Merge branch 'mvneta-ethtool-autoneg'
Stas Sergeev says:
====================
mvneta: implement ethtool autonegotiation control
These 2 patches add an ability to control the
autonegotiation via ethtool. For example:
ethtool -s eth0 autoneg off
ethtool -s eth0 autoneg on
This is needed if you want to connect the mvneta's MII
to different switches or PHYs: the ones the do support
the in-band status, and the ones that do not.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stas Sergeev [Wed, 2 Dec 2015 17:35:11 +0000 (20:35 +0300)]
mvneta: implement ethtool autonegotiation control
This patch allows to do
ethtool -s eth0 autoneg off
ethtool -s eth0 autoneg on
to disable or enable autonegotiation at run-time.
Without that functionality, the only way to control the autonegotiation
is to modify the device tree.
This is needed if you plan to use the same kernel with
different ethernet switches, the ones that support the in-band
status and the ones that not.
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stas Sergeev [Wed, 2 Dec 2015 17:33:56 +0000 (20:33 +0300)]
mvneta: consolidate autoneg enabling
This moves autoneg-related bit manipulations to the single place.
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thierry Reding [Wed, 2 Dec 2015 16:30:29 +0000 (17:30 +0100)]
net: mv643xx: Use platform_register/unregister_drivers()
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thierry Reding [Wed, 2 Dec 2015 16:30:28 +0000 (17:30 +0100)]
net: mpc52xx: Use platform_register/unregister_drivers()
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thierry Reding [Wed, 2 Dec 2015 16:30:27 +0000 (17:30 +0100)]
net: bcm63xx: Use platform_register/unregister_drivers()
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thierry Reding [Wed, 2 Dec 2015 16:30:26 +0000 (17:30 +0100)]
net: bfin_mac: Use platform_register/unregister_drivers()
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Wed, 2 Dec 2015 15:27:39 +0000 (16:27 +0100)]
pppox: use standard module auto-loading feature
* Register PF_PPPOX with pppox module rather than with pppoe,
so that pppoe doesn't get loaded for any PF_PPPOX socket.
* Register PX_PROTO_* with standard MODULE_ALIAS_NET_PF_PROTO()
instead of using pppox's own naming scheme.
* While there, add auto-loading feature for pptp.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Dec 2015 20:07:14 +0000 (15:07 -0500)]
Merge branch 'bnxt_en-fixes'
Michael Chan says:
====================
bnxt_en: set mac address and uc_list bug fixes.
Fix ndo_set_mac_address() for PF and VF.
Re-apply uc_list after chip reset.
v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 2 Dec 2015 06:54:08 +0000 (01:54 -0500)]
bnxt_en: Setup uc_list mac filters after resetting the chip.
Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and
mc_list mac address filters. Before the patch, uc_list is not
setup again after chip reset (such as ethtool ring size change)
and macvlans don't work any more after that.
Modify bnxt_cfg_rx_mode() to return error codes appropriately so
that the init chip sequence can detect any failures.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeffrey Huang [Wed, 2 Dec 2015 06:54:07 +0000 (01:54 -0500)]
bnxt_en: enforce proper storing of MAC address
For PF, the bp->pf.mac_addr always holds the permanent MAC
addr assigned by the HW. For VF, the bp->vf.mac_addr always
holds the administrator assigned VF MAC addr. The random
generated VF MAC addr should never get stored to bp->vf.mac_addr.
This way, when the VF wants to change the MAC address, we can tell
if the adminstrator has already set it and disallow the VF from
changing it.
v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeffrey Huang [Wed, 2 Dec 2015 06:54:06 +0000 (01:54 -0500)]
bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
The existing ndo_set_mac_address only copies the new MAC addr
and didn't set the new MAC addr to the HW. The correct way is
to delete the existing default MAC filter from HW and add
the new one. Because of RFS filters are also dependent on the
default mac filter l2 context, the driver must go thru
close_nic() to delete the default MAC and RFS filters, then
open_nic() to set the default MAC address to HW.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Dec 2015 20:05:56 +0000 (15:05 -0500)]
Merge branch 'vsock-virtio'
Stefan Hajnoczi says:
====================
Add virtio transport for AF_VSOCK
v2:
* Rebased onto Linux v4.4-rc2
* vhost: Refuse to assign reserved CIDs
* vhost: Refuse guest CID if already in use
* vhost: Only accept correctly addressed packets (no spoofing!)
* vhost: Support flexible rx/tx descriptor layout
* vhost: Add missing total_tx_buf decrement
* virtio_transport: Fix total_tx_buf accounting
* virtio_transport: Add virtio_transport global mutex to prevent races
* common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
semantics)
* common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
* common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
* common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
* common: Fix peer_buf_alloc inheritance on child socket
This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
AF_VSOCK is designed for communication between virtual machines and
hypervisors. It is currently only implemented for VMware's VMCI transport.
This series implements the proposed virtio-vsock device specification from
here:
http://comments.gmane.org/gmane.comp.emulators.virtio.devel/855
Most of the work was done by Asias He and Gerd Hoffmann a while back. I have
picked up the series again.
The QEMU userspace changes are here:
https://github.com/stefanha/qemu/commits/vsock
Why virtio-vsock?
-----------------
Guest<->host communication is currently done over the virtio-serial device.
This makes it hard to port sockets API-based applications and is limited to
static ports.
virtio-vsock uses the sockets API so that applications can rely on familiar
SOCK_STREAM and SOCK_DGRAM semantics. Applications on the host can easily
connect to guest agents because the sockets API allows multiple connections to
a listen socket (unlike virtio-serial). This simplifies the guest<->host
communication and eliminates the need for extra processes on the host to
arbitrate virtio-serial ports.
Overview
--------
This series adds 3 pieces:
1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko
2. virtio_transport.ko - guest driver
3. drivers/vhost/vsock.ko - host driver
Howto
-----
The following kernel options are needed:
CONFIG_VSOCKETS=y
CONFIG_VIRTIO_VSOCKETS=y
CONFIG_VIRTIO_VSOCKETS_COMMON=y
CONFIG_VHOST_VSOCK=m
Launch QEMU as follows:
# qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3
Guest and host can communicate via AF_VSOCK sockets. The host's CID (address)
is 2 and the guest is automatically assigned a CID (use VMADDR_CID_ANY (-1) to
bind to it).
Status
------
There are a few design changes I'd like to make to the virtio-vsock device:
1. The 3-way handshake isn't necessary over a reliable transport (virtqueue).
Spoofing packets is also impossible so the security aspects of the 3-way
handshake (including syn cookie) add nothing. The next version will have a
single operation to establish a connection.
2. Credit-based flow control doesn't work for SOCK_DGRAM since multiple clients
can transmit to the same listen socket. There is no way for the clients to
coordinate buffer space with each other fairly. The next version will drop
credit-based flow control for SOCK_DGRAM and only rely on best-effort
delivery. SOCK_STREAM still has guaranteed delivery.
3. In the next version only the host will be able to establish connections
(i.e. to connect to a guest agent). This is for security reasons since
there is currently no ability to provide host services only to certain
guests. This also matches how AF_VSOCK works on modern VMware hypervisors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Asias He [Wed, 2 Dec 2015 06:44:03 +0000 (14:44 +0800)]
VSOCK: Add Makefile and Kconfig
Enable virtio-vsock and vhost-vsock.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Asias He [Wed, 2 Dec 2015 06:44:02 +0000 (14:44 +0800)]
VSOCK: Introduce vhost-vsock.ko
VM sockets vhost transport implementation. This module runs in host
kernel.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Asias He [Wed, 2 Dec 2015 06:44:01 +0000 (14:44 +0800)]
VSOCK: Introduce virtio-vsock.ko
VM sockets virtio transport implementation. This module runs in guest
kernel.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Asias He [Wed, 2 Dec 2015 06:44:00 +0000 (14:44 +0800)]
VSOCK: Introduce virtio-vsock-common.ko
This module contains the common code and header files for the following
virtio-vsock and virtio-vhost kernel modules.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Asias He [Wed, 2 Dec 2015 06:43:59 +0000 (14:43 +0800)]
VSOCK: Introduce vsock_find_unbound_socket and vsock_bind_dgram_generic
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Wed, 2 Dec 2015 06:18:11 +0000 (22:18 -0800)]
mpls: support for dead routes
Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection.
Unlike ip routes, mpls routes are not deleted when the route goes
dead. This is current mpls behaviour and this patch does not change
that. With this patch however, routes will be marked dead.
dead routes are not notified to userspace (this is consistent with ipv4
routes).
dead routes:
-----------
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1
nexthop as to 700 via inet 10.1.1.6 dev swp2
$ip link set dev swp1 down
$ip link show dev swp1
4: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode
DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1 dead linkdown
nexthop as to 700 via inet 10.1.1.6 dev swp2
linkdown routes:
----------------
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1
nexthop as to 700 via inet 10.1.1.6 dev swp2
$ip link show dev swp1
4: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
/* carrier goes down */
$ip link show dev swp1
4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1 linkdown
nexthop as to 700 via inet 10.1.1.6 dev swp2
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Zapolskiy [Wed, 2 Dec 2015 06:12:13 +0000 (08:12 +0200)]
net: lpc_eth: remove irq > NR_IRQS check from probe()
If the driver is used on an ARM platform with SPARSE_IRQ defined,
semantics of NR_IRQS is different (minimal value of virtual irqs) and
by default it is set to 16, see arch/arm/include/asm/irq.h.
This value may be less than the actual number of virtual irqs, which
may break the driver initialization. The check removal allows to use
the driver on such a platform, and, if irq controller driver works
correctly, the check is not needed on legacy platforms.
Fixes a runtime problem:
lpc-eth
31060000.ethernet: error getting resources.
lpc_eth: lpc-eth: not found (-6).
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Dec 2015 20:01:09 +0000 (15:01 -0500)]
Merge branch 'rsvb-compat-strings'
Simon Horman says:
====================
ravb: More compatibility strings
this short series adds generic gen2 and gen3, and soc-specific
compatibility strings for the missing gen2 SoCs.
Key Changes in v2:
* Include "rcar-" in generic bindings
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Wed, 2 Dec 2015 05:58:33 +0000 (14:58 +0900)]
ravb: add device tree support for r8a779[123]
Simply document new compatibility strings.
As a previous patch adds a generic R-Car Gen2 compatibility string
there appears to be no need for a driver updates.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Wed, 2 Dec 2015 05:58:32 +0000 (14:58 +0900)]
ravb: add fallback compatibility strings
Add fallback compatibility strings for R-Car Gen 2 & 3 SoC Families.
This is in keeping with the fallback scheme being adopted wherever appropriate
for drivers for Renesas SoCs.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 2 Dec 2015 04:08:51 +0000 (20:08 -0800)]
net_sched: fix qdisc_tree_decrease_qlen() races
qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
devices.
One problem is that it updates sch->q.qlen and sch->qstats.drops
on the mq/mqprio root qdisc, while it should not : Daniele
reported underflows errors :
[ 681.774821] PAX: sch->q.qlen: 0 n: 1
[ 681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
[ 681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G O 4.2.6.
201511282239-1-grsec #1
[ 681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
[ 681.774956]
ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
[ 681.774959]
ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
[ 681.774960]
ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
[ 681.774962] Call Trace:
[ 681.774967] [<
ffffffffa95d2810>] dump_stack+0x4c/0x7f
[ 681.774970] [<
ffffffffa91a44f4>] report_size_overflow+0x34/0x50
[ 681.774972] [<
ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
[ 681.774976] [<
ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
[ 681.774978] [<
ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
[ 681.774980] [<
ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
[ 681.774983] [<
ffffffffa949b2b2>] net_tx_action+0xc2/0x160
[ 681.774985] [<
ffffffffa90664c1>] __do_softirq+0xf1/0x200
[ 681.774987] [<
ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
[ 681.774989] [<
ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
[ 681.774991] [<
ffffffffa9089560>] ? sort_range+0x40/0x40
[ 681.774992] [<
ffffffffa9085fe4>] kthread+0xe4/0x100
[ 681.774994] [<
ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
[ 681.774995] [<
ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70
mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.
A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.
Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.
Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.
A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.
Reported-by: Daniele Fucini <dfucini@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Tue, 1 Dec 2015 21:45:15 +0000 (22:45 +0100)]
net: ipv6: restrict hop_limit sysctl setting to range [1; 255]
Setting a value bigger than 255 resulted in using only the lower eight
bits of that value as it is assigned to the u8 header field. To avoid
this unexpected result, reject such values.
Setting a value of zero is technically possible, but hosts receiving
such a packet have to treat it like hop_limit was set to one, according
to RFC2460. Therefore I don't see a use-case for that.
Setting a route's hop_limit to zero in iproute2 means to use the sysctl
default, which is not the case here: Setting e.g.
net.conf.eth0.hop_limit=0 will not make the kernel use
net.conf.all.hop_limit for outgoing packets on eth0. To avoid these
kinds of confusion, reject zero.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 1 Dec 2015 17:33:36 +0000 (18:33 +0100)]
openvswitch: fix hangup on vxlan/gre/geneve device deletion
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kazuya Mizuguchi [Tue, 1 Dec 2015 17:04:39 +0000 (02:04 +0900)]
ravb: ptp: Add CONFIG mode support
This patch makes PTP support active in CONFIG mode on R-Car Gen3.
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Dec 2015 19:17:51 +0000 (14:17 -0500)]
Merge branch 'netronome-NFP4000-NFP6000'
Jakub Kicinski says:
====================
Netronome NFP4000/NFP6000 NIC VF driver
This patchset adds support for VFs of Netronome's NFP-4000 and NFP-6000
based NICs. We are currently also preparing the submission for the PF
driver, but it is not quite ready yet. The PF driver can be found on
GitHub:
https://github.com/Netronome/nfp-drv-kmods
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 1 Dec 2015 14:55:22 +0000 (14:55 +0000)]
net: add driver for Netronome NFP4000/NFP6000 NIC VFs
Add driver for Virtual Functions for the Netronome's
NFP-4000 and NFP-6000 based NICs.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 1 Dec 2015 14:55:21 +0000 (14:55 +0000)]
pci_ids: add Netronome Systems vendor
Add PCI vendor id for Netronome Systems.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Dec 2015 17:11:00 +0000 (12:11 -0500)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2015-12-03
This series contains updates to i40e and i40evf only.
Mitch updates the i40evf driver by increasing the maximum number of queues,
since future devices will allow for more queue pairs. Cleans up a
duplicate printing of the driver info string done in init, since it is
already done in probe. Cleaned up the several allocations which did
not need to be at atomic level, where GFP_KERNEL would work just fine.
Then makes i40e_sync_vsi_filters() a more mature function, make having
a common exit point so it will properly release the busy lock on the VSI
and propagate errors to the callers. Then does some whitespace
housekeeping in i40evf.
Kiran moves and updates the detection/recovery of transmit queue hang code
to service_task from tx_timeout function. Also fixed memory leak when
users program flow-director filter using ethtool (sideband filter
programming), the cause being the check of 'tx_buffer->skb' was preventing
'raw_buf' from being freed as part of the cleanup.
Jesse enabled the ability to turn off/on packet split using ethtool priv
flags. Then does some housekeeping for both the i40e and i40evf drivers
which includes: remove unused/useless code, correct whitespace, remove
duplicate #include, fix incorrect comment, etc...
Neerav cleans up functions to gather Flow Control Rx XOFF stats, since
the recent change in the driver logic for checking transmit hang has been
moved, so these functions do not do anything meaningful any longer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>