Thomas Gleixner [Tue, 18 Mar 2014 17:19:13 +0000 (17:19 +0000)]
can: c_can: Reduce register access
commit
4ce78a838c (can: c_can: Speed up rx_poll function) hyped a
performance improvement by reducing the access to the interrupt
pending register from a dual 16 bit to a single 16 bit access. Wow!
Thereby it crippled the driver to cast the 16 msg objects in stone,
which is completly braindead as contemporary hardware has up to 128
message objects. Supporting larger object buffers is a major surgery,
but it'd be definitely worth it especially as the driver does not
support HW message filtering ....
The logic of the "FIFO" implementation is to split the FIFO in half.
For the lower half we read the buffers and clear the interrupt pending
bit, but keep the newdat bit set, so the HW will queue above those
buffers.
When we read out the last low buffer then we reenable all the low half
buffers by clearing the newdat bit.
The upper half buffers clear the newdat and the interrupt pending bit
right away as we know that the lower half bits are clear and give us a
headstart against the hardware.
Now the implementation is:
transfer_message_object()
read_object_and_put_into_skb();
if (obj < END_OF_LOW_BUF)
clear_intpending(obj)
else if (obj > END_OF_LOW_BUF)
clear_intpending_and_newdat(obj)
else if (obj == END_OF_LOW_BUF)
clear_newdat_of_all_low_objects()
The hardware allows to avoid most of the mess simply because we can
tell the transfer_message_object() function to clear bits right away.
So we can be clever and do:
if (obj <= END_OF_LOW_BUF)
ctrl = TRANSFER_MSG | CLEAR_INTPND;
else
ctrl = TRANSFER_MSG | CLEAR_INTPND | CLEAR_NEWDAT;
transfer_message_object(ctrl)
read_object_and_put_into_skb();
if (obj == END_OF_LOW_BUF)
clear_newdat_of_all_low_objects()
So we save a complete control operation on all message objects except
the one which is the end of the low buffer. That's a few micro seconds
per object.
I'm not adding a boasting profile to that, simply because it's self
explaining.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mkl: adjusted subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Thomas Gleixner [Tue, 18 Mar 2014 18:27:42 +0000 (19:27 +0100)]
can: c_can: Make the code readable
If every other line contains line breaks, that's a clear sign for
indentation level madness. Split out the inner loop and move the code
to a separate function. gcc creates slightly worse code for that, but
we'll fix that in the next step.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mkl: adjusted subject]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Thomas Gleixner [Tue, 18 Mar 2014 17:19:12 +0000 (17:19 +0000)]
can: c_can: Provide protection in the xmit path
The network core does not serialize the access to the hardware. The
xmit related code lets the following happen:
CPU0 CPU1
interrupt()
do_poll()
c_can_do_tx()
Fiddle with HW and xmit()
internal data Fiddle with HW and
internal data
due the complete lack of serialization.
Add proper locking.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Thomas Gleixner [Tue, 18 Mar 2014 17:19:11 +0000 (17:19 +0000)]
can: c_can: Remove EOB exit
The rx_poll code has the following gem:
if (msg_ctrl_save & IF_MCONT_EOB)
return num_rx_pkts;
The EOB bit is the indicator for the hardware that this is the last
configured FIFO object. But this object can contain valid data, if we
manage to free up objects before the overrun case hits.
Now if the code exits due to the EOB bit set, then this buffer is
stale and the interrupt bit and NewDat bit of the buffer are still
set. Results in a nice interrupt storm unless we come into an overrun
situation where the MSGLST bit gets set.
ksoftirqd/0-3 [000] ..s. 79.124101: c_can_poll: rx_poll: val:
00008001 pend
00008001
ksoftirqd/0-3 [000] ..s. 79.124176: c_can_poll: rx_poll: val:
00008000 pend
00008000
ksoftirqd/0-3 [000] ..s. 79.124187: c_can_poll: rx_poll: val:
00008002 pend
00008002
ksoftirqd/0-3 [000] ..s. 79.124256: c_can_poll: rx_poll: val:
00008000 pend
00008000
ksoftirqd/0-3 [000] ..s. 79.124267: c_can_poll: rx_poll: val:
00008000 pend
00008000
The amazing thing is that the check of the MSGLST (aka overrun bit)
used to be after the check of the EOB bit. That was "fixed" in commit
5d0f801a2c(can: c_can: Fix RX message handling, handle lost message
before EOB). But the author of this "fix" did not even understand that
the EOB check is broken as well.
Again a simple solution: Remove
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mkl: adjusted subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Thomas Gleixner [Tue, 18 Mar 2014 17:19:10 +0000 (17:19 +0000)]
can: c_can: Fix the lost message handling
The lost message handling is broken in several ways.
1) Clearing the message lost flag is done by writing 0 to the
message control register of the object.
#define IF_MCONT_CLR_MSGLST (0 << 14)
That clears the object buffer configuration in the worst case,
which results in a loss of the EOB flag. That leaves the FIFO chain
without a limit and causes a complete lockup of the HW
2) In case that the error skb allocation fails, the code happily
claims that it handed down a packet. Just an accounting bug, but ....
3) The code adds a lot of pointless overhead to that error case, where
we need to get stuff done as fast as possible to avoid more packet
loss.
- printk an annoying error message
- reread the object buffer for nothing
Fix is simple again:
- Use the already known MSGCTRL content and only clear the MSGLST bit
- Fix the buffer accounting by adding a proper return code
- Remove the pointless operations
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Thomas Gleixner [Tue, 18 Mar 2014 17:19:10 +0000 (17:19 +0000)]
can: c_can: Fix buffer ordering
The buffer handling of c_can has been broken forever. That leads to
message reordering:
ksoftirqd/0-3 [000] ..s. 79.123776: c_can_poll: rx_poll: val:
00007fff
ksoftirqd/0-3 [000] ..s. 79.124101: c_can_poll: rx_poll: val:
00008001
What happens is:
CPU HW
queue new packet into obj 16 (0-15 are busy)
read obj 1-15
return because pending is 0
set pending obj 16 -> pending reg 8000
queue new packet into obj 1
set pending obj 1 -> pending reg 8001
So the current algorithmus reads the newest message first, which
violates the ordering rules of CAN.
Add proper handling of that situation by analyzing the contents of the
pending register for gaps.
This does NOT fix the message object corruption which can lead to
interrupt storms. Thats addressed in the next patches.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mkl: adjusted subject]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Thomas Gleixner [Tue, 18 Mar 2014 17:19:09 +0000 (17:19 +0000)]
can: c_can: Make it SMP safe
The hardware has two message control interfaces, but the code only uses the
first one. So on SMP the following can be observed:
CPU0 CPU1
rx_poll()
write IF1 xmit()
write IF1
write IF1
That results in corrupted message object configurations. The TX/RX is not
globally serialized it's only serialized on a core.
Simple solution: Let RX use IF1 and TX use IF2 and all is good.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Thomas Gleixner [Tue, 18 Mar 2014 17:19:08 +0000 (17:19 +0000)]
can: c_can: Fix hardware raminit function
The function is broken in several ways:
- The function does not wait for the init to complete.
That can take quite some microseconds.
- No protection against being called for two chips at the same
time. SMP is such a new thing, right?
Clear the start and the init done bit unconditionally and wait for both bits to
be clear.
In the enable path set the init bit and wait for the init done bit.
Add proper locking.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Thomas Gleixner [Tue, 18 Mar 2014 17:19:08 +0000 (17:19 +0000)]
can: c_can: Wait for CONTROL_INIT to be cleared
According to the documentation the CPU must wait for CONTROL_INIT to
be cleared before writing to the baudrate registers.
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Tue, 18 Mar 2014 18:06:01 +0000 (19:06 +0100)]
can: c_can: check return value to users of c_can_set_bittiming()
This patch adds return value checking to all direct and indirect users of
c_can_set_bittiming().
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Tue, 18 Mar 2014 18:13:59 +0000 (19:13 +0100)]
can: c_can: free_c_can_dev(): add missing netif_napi_del()
This patch adds the missing netif_napi_del() to the free_c_can_dev() function.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Robert Schwebel [Wed, 19 Mar 2014 08:49:42 +0000 (09:49 +0100)]
can: Documentation: fix parameter name "sample-point"
This patch fixes the name of the parameter to configure the sample point used
in iproute2's ip command. The correct writing is "sample-point" not
"sample_point".
Signed-off-by: Robert Schwebel <r.schwebel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Bjorn Van Tilt [Mon, 24 Mar 2014 14:32:08 +0000 (15:32 +0100)]
can: usb_8dev: Fix memory leak in usb_8dev_start_xmit
Fixed a memory leak when an error occurred in the transmit function. In the
error handling the urb wasn't freed before returning. There was also a call to
the usb_unanchor_urb() function but the urb wasn't anchored.
Signed-off-by: Bjorn Van Tilt <bjorn.vantilt@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Alexander Aring [Mon, 31 Mar 2014 01:26:51 +0000 (03:26 +0200)]
at86rf230: mask irq's before deregister device
While transmit over a at86rf231 device and unloading the module I got:
[ 29.643073] WARNING: CPU: 0 PID: 3 at kernel/workqueue.c:1335 __queue_work+0xb4/0x224()
[ 29.651457] Modules linked in: at86rf230(-) autofs4
[ 29.656612] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G W
3.14.0-rc6-01602-g902659e-dirty #294
[ 29.666490] [<
c00124f0>] (unwind_backtrace) from [<
c0010ad0>] (show_stack+0x10/0x14)
[ 29.674628] [<
c0010ad0>] (show_stack) from [<
c0032c80>] (warn_slowpath_common+0x60/0x80)
[ 29.683116] [<
c0032c80>] (warn_slowpath_common) from [<
c0032d30>] (warn_slowpath_null+0x18/0x20)
[ 29.692329] [<
c0032d30>] (warn_slowpath_null) from [<
c0045b08>] (__queue_work+0xb4/0x224)
[ 29.700906] [<
c0045b08>] (__queue_work) from [<
c0045cc8>] (queue_work_on+0x50/0x78)
[ 29.708944] [<
c0045cc8>] (queue_work_on) from [<
c05669cc>] (mac802154_tx+0x1e4/0x240)
[ 29.717164] [<
c05669cc>] (mac802154_tx) from [<
c0471814>] (dev_hard_start_xmit+0x2f0/0x43c)
[ 29.725926] [<
c0471814>] (dev_hard_start_xmit) from [<
c04878d0>] (sch_direct_xmit+0x64/0x2a0)
[ 29.734867] [<
c04878d0>] (sch_direct_xmit) from [<
c0487c38>] (__qdisc_run+0x12c/0x18c)
[ 29.743169] [<
c0487c38>] (__qdisc_run) from [<
c046e1b0>] (net_tx_action+0xe0/0x178)
[ 29.751205] [<
c046e1b0>] (net_tx_action) from [<
c0036690>] (__do_softirq+0x100/0x264)
[ 29.759420] [<
c0036690>] (__do_softirq) from [<
c0036818>] (run_ksoftirqd+0x24/0x4c)
[ 29.767453] [<
c0036818>] (run_ksoftirqd) from [<
c005232c>] (smpboot_thread_fn+0x128/0x13c)
[ 29.776121] [<
c005232c>] (smpboot_thread_fn) from [<
c004c3fc>] (kthread+0xd0/0xe4)
[ 29.784061] [<
c004c3fc>] (kthread) from [<
c000da88>] (ret_from_fork+0x14/0x2c)
[ 29.791628] ---[ end trace
3406ff24bd973834 ]---
The problem is there are still interrupts after deregister ieee802154
device. This patch mask all interrupts in the at86rf2xx chips before
deregister the device.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Mar 2014 22:52:04 +0000 (18:52 -0400)]
Merge branch 'xen-netback'
Paul Durrant says:
====================
xen-netback: fix rx slot estimation
Sander Eikelenboom reported an issue with ring overflow in netback in
3.14-rc3. This turns outo be be because of a bug in the ring slot estimation
code. This patch series fixes the slot estimation, fixes the BUG_ON() that
was supposed to catch the issue that Sander ran into and also makes a small
fix to start_new_rx_buffer().
v3:
- Added a cap of MAX_SKB_FRAGS to estimate in patch #2
v2:
- Added BUG_ON() to patch #1
- Added more explanation to patch #3
====================
Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Fri, 28 Mar 2014 11:39:07 +0000 (11:39 +0000)]
xen-netback: BUG_ON in xenvif_rx_action() not catching overflow
The BUG_ON to catch ring overflow in xenvif_rx_action() makes the assumption
that meta_slots_used == ring slots used. This is not necessarily the case
for GSO packets, because the non-prefix GSO protocol consumes one more ring
slot than meta-slot for the 'extra_info'. This patch changes the test to
actually check ring slots.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Fri, 28 Mar 2014 11:39:06 +0000 (11:39 +0000)]
xen-netback: worse-case estimate in xenvif_rx_action is underestimating
The worse-case estimate for skb ring slot usage in xenvif_rx_action()
fails to take fragment page_offset into account. The page_offset does,
however, affect the number of times the fragmentation code calls
start_new_rx_buffer() (i.e. consume another slot) and the worse-case
should assume that will always return true. This patch adds the page_offset
into the DIV_ROUND_UP for each frag.
Unfortunately some frontends aggressively limit the number of requests
they post into the shared ring so to avoid an estimate that is 'too'
pessimal it is capped at MAX_SKB_FRAGS.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Fri, 28 Mar 2014 11:39:05 +0000 (11:39 +0000)]
xen-netback: remove pointless clause from if statement
This patch removes a test in start_new_rx_buffer() that checks whether
a copy operation is less than MAX_BUFFER_OFFSET in length, since
MAX_BUFFER_OFFSET is defined to be PAGE_SIZE and the only caller of
start_new_rx_buffer() already limits copy operations to PAGE_SIZE or less.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 28 Mar 2014 22:09:37 +0000 (15:09 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) We've discovered a common error in several networking drivers, they
put VLAN offload features into ->vlan_features, which would suggest
that they support offloading 2 or more levels of VLAN encapsulation.
Not only do these devices not do that, but we don't have the
infrastructure yet to handle that at all.
Fixes from Vlad Yasevich.
2) Fix tcpdump crash with bridging and vlans, also from Vlad.
3) Some MAINTAINERS updates for random32 and bonding.
4) Fix late reseeds of prandom generator, from Sasha Levin.
5) Bridge doesn't handle stacked vlans properly, fix from Toshiaki
Makita.
6) Fix deadlock in openvswitch, from Flavio Leitner.
7) get_timewait4_sock() doesn't report delay times correctly, fix from
Eric Dumazet.
8) Duplicate address detection and addrconf verification need to run in
contexts where RTNL can be obtained. Move them to run from a
workqueue. From Hannes Frederic Sowa.
9) Fix route refcount leaking in ip tunnels, from Pravin B Shelar.
10) Don't return -EINTR from non-blocking recvmsg() on AF_UNIX sockets,
from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
vlan: Warn the user if lowerdev has bad vlan features.
veth: Turn off vlan rx acceleration in vlan_features
ifb: Remove vlan acceleration from vlan_features
qlge: Do not propaged vlan tag offloads to vlans
bridge: Fix crash with vlan filtering and tcpdump
net: Account for all vlan headers in skb_mac_gso_segment
MAINTAINERS: bonding: change email address
MAINTAINERS: bonding: change email address
ipv6: move DAD and addrconf_verify processing to workqueue
tcp: fix get_timewait4_sock() delay computation on 64bit
openvswitch: fix a possible deadlock and lockdep warning
bridge: Fix handling stacked vlan tags
bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled
vhost: validate vhost_get_vq_desc return value
vhost: fix total length when packets are too short
random32: avoid attempt to late reseed if in the middle of seeding
random32: assign to network folks in MAINTAINERS
net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset
core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
vlan: Set hard_header_len according to available acceleration
...
David S. Miller [Fri, 28 Mar 2014 21:17:16 +0000 (17:17 -0400)]
Merge branch 'vlan_offloads'
Vlad Yasevich says:
====================
Audit all drivers for correct vlan_features.
Some drivers set vlan acceleration features in vlan_features. This causes
issues with Q-in-Q/802.1ad configurations.
Audit all the drivers for correct vlan_features. Fix broken ones.
Add a warning to vlan code to help catch future offenders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 28 Mar 2014 02:14:49 +0000 (22:14 -0400)]
vlan: Warn the user if lowerdev has bad vlan features.
Some drivers incorrectly assign vlan acceleration features to
vlan_features thus causing issues for Q-in-Q vlan configurations.
Warn the user of such cases.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 28 Mar 2014 02:14:48 +0000 (22:14 -0400)]
veth: Turn off vlan rx acceleration in vlan_features
For completeness, turn off vlan rx acceleration in vlan_features so
that it doesn't show up on q-in-q setups.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 28 Mar 2014 02:14:47 +0000 (22:14 -0400)]
ifb: Remove vlan acceleration from vlan_features
Do not include vlan acceleration features in vlan_features as that
precludes correct Q-in-Q operation.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 28 Mar 2014 02:14:46 +0000 (22:14 -0400)]
qlge: Do not propaged vlan tag offloads to vlans
qlge driver turns off NETIF_F_HW_CTAG_FILTER, but forgets to
turn off HW_CTAG_TX and HW_CTAG_RX on vlan devices. With the
current settings, q-in-q will only generate a single vlan header.
Remember to mask off CTAG_TX and CTAG_RX features in vlan_features.
CC: Shahed Shaikh <shahed.shaikh@qlogic.com>
CC: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
CC: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 28 Mar 2014 01:51:18 +0000 (21:51 -0400)]
bridge: Fix crash with vlan filtering and tcpdump
When the vlan filtering is enabled on the bridge, but
the filter is not configured on the bridge device itself,
running tcpdump on the bridge device will result in a
an Oops with NULL pointer dereference. The reason
is that br_pass_frame_up() will bypass the vlan
check because promisc flag is set. It will then try
to get the table pointer and process the packet based
on the table. Since the table pointer is NULL, we oops.
Catch this special condition in br_handle_vlan().
Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Thu, 27 Mar 2014 21:26:18 +0000 (17:26 -0400)]
net: Account for all vlan headers in skb_mac_gso_segment
skb_network_protocol() already accounts for multiple vlan
headers that may be present in the skb. However, skb_mac_gso_segment()
doesn't know anything about it and assumes that skb->mac_len
is set correctly to skip all mac headers. That may not
always be the case. If we are simply forwarding the packet (via
bridge or macvtap), all vlan headers may not be accounted for.
A simple solution is to allow skb_network_protocol to return
the vlan depth it has calculated. This way skb_mac_gso_segment
will correctly skip all mac headers.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Thu, 27 Mar 2014 17:43:50 +0000 (18:43 +0100)]
MAINTAINERS: bonding: change email address
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jay Vosburgh [Thu, 27 Mar 2014 17:33:44 +0000 (10:33 -0700)]
MAINTAINERS: bonding: change email address
Update my email address.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 28 Mar 2014 20:57:13 +0000 (13:57 -0700)]
Merge branch 'akpm' (patches from Andrew Morton)
Merge two fixes from Andrew Morton:
"The x86 fix should come from x86 guys but they appear to be
conferencing or otherwise distracted.
The ocfs2 fix is a bit of a mess - the code runs into an immediate
NULL deref and we're trying to work out how this got through test and
review, but we haven't heard from Goldwyn in the past few days.
Sasha's patch fixes the oops, but the feature as a whole is probably
broken. So this is a stopgap for 3.14 - I'll aim to get the real
fixes into 3.14.x"
* emailed patches from Andrew Morton akpm@linux-foundation.org>:
x86: fix boot on uniprocessor systems
ocfs2: check if cluster name exists before deref
Artem Fetishev [Fri, 28 Mar 2014 20:33:39 +0000 (13:33 -0700)]
x86: fix boot on uniprocessor systems
On x86 uniprocessor systems topology_physical_package_id() returns -1
which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized
which leads to GPF in rapl_pmu_init().
See arch/x86/kernel/cpu/perf_event_intel_rapl.c.
It turns out that physical_package_id and core_id can actually be
retreived for uniprocessor systems too. Enabling them also fixes
rapl_pmu code.
Signed-off-by: Artem Fetishev <artem_fetishev@epam.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sasha Levin [Fri, 28 Mar 2014 20:33:38 +0000 (13:33 -0700)]
ocfs2: check if cluster name exists before deref
Commit
c74a3bdd9b52 ("ocfs2: add clustername to cluster connection") is
trying to strlcpy a string which was explicitly passed as NULL in the
very same patch, triggering a NULL ptr deref.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: strlcpy (lib/string.c:388 lib/string.c:151)
CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G W
3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274
RIP: strlcpy (lib/string.c:388 lib/string.c:151)
Call Trace:
ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350)
ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396)
user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679)
dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503)
vfs_mkdir (fs/namei.c:3467)
SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472)
tracesys (arch/x86/kernel/entry_64.S:749)
akpm: this patch probably disables the feature. A temporary thing to
avoid triviel oopses.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hannes Frederic Sowa [Thu, 27 Mar 2014 17:28:07 +0000 (18:28 +0100)]
ipv6: move DAD and addrconf_verify processing to workqueue
addrconf_join_solict and addrconf_join_anycast may cause actions which
need rtnl locked, especially on first address creation.
A new DAD state is introduced which defers processing of the initial
DAD processing into a workqueue.
To get rtnl lock we need to push the code paths which depend on those
calls up to workqueues, specifically addrconf_verify and the DAD
processing.
(v2)
addrconf_dad_failure needs to be queued up to the workqueue, too. This
patch introduces a new DAD state and stop the DAD processing in the
workqueue (this is because of the possible ipv6_del_addr processing
which removes the solicited multicast address from the device).
addrconf_verify_lock is removed, too. After the transition it is not
needed any more.
As we are not processing in bottom half anymore we need to be a bit more
careful about disabling bottom half out when we lock spin_locks which are also
used in bh.
Relevant backtrace:
[ 541.030090] RTNL: assertion failed at net/core/dev.c (4496)
[ 541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.10.33-1-amd64-vyatta #1
[ 541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 541.031146]
ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
[ 541.031148]
0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
[ 541.031150]
0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
[ 541.031152] Call Trace:
[ 541.031153] <IRQ> [<
ffffffff8148a9f0>] ? dump_stack+0xd/0x17
[ 541.031180] [<
ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
[ 541.031183] [<
ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
[ 541.031185] [<
ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
[ 541.031189] [<
ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
[ 541.031198] [<
ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
[ 541.031208] [<
ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
[ 541.031212] [<
ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
[ 541.031216] [<
ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
[ 541.031219] [<
ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
[ 541.031223] [<
ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
[ 541.031226] [<
ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
[ 541.031229] [<
ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
[ 541.031233] [<
ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
[ 541.031241] [<
ffffffff81075c1d>] ? task_cputime+0x2d/0x50
[ 541.031244] [<
ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
[ 541.031247] [<
ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
[ 541.031255] [<
ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
[ 541.031258] [<
ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
Hunks and backtrace stolen from a patch by Stephen Hemminger.
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 27 Mar 2014 14:19:19 +0000 (07:19 -0700)]
tcp: fix get_timewait4_sock() delay computation on 64bit
It seems I missed one change in get_timewait4_sock() to compute
the remaining time before deletion of IPV4 timewait socket.
This could result in wrong output in /proc/net/tcp for tm->when field.
Fixes: 96f817fedec4 ("tcp: shrink tcp6_timewait_sock by one cache line")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flavio Leitner [Thu, 27 Mar 2014 14:05:34 +0000 (11:05 -0300)]
openvswitch: fix a possible deadlock and lockdep warning
There are two problematic situations.
A deadlock can happen when is_percpu is false because it can get
interrupted while holding the spinlock. Then it executes
ovs_flow_stats_update() in softirq context which tries to get
the same lock.
The second sitation is that when is_percpu is true, the code
correctly disables BH but only for the local CPU, so the
following can happen when locking the remote CPU without
disabling BH:
CPU#0 CPU#1
ovs_flow_stats_get()
stats_read()
+->spin_lock remote CPU#1 ovs_flow_stats_get()
| <interrupted> stats_read()
| ... +--> spin_lock remote CPU#0
| | <interrupted>
| ovs_flow_stats_update() | ...
| spin_lock local CPU#0 <--+ ovs_flow_stats_update()
+---------------------------------- spin_lock local CPU#1
This patch disables BH for both cases fixing the deadlocks.
Acked-by: Jesse Gross <jesse@nicira.com>
=================================
[ INFO: inconsistent lock state ]
3.14.0-rc8-00007-g632b06a #1 Tainted: G I
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
(&(&cpu_stats->lock)->rlock){+.?...}, at: [<
ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
{SOFTIRQ-ON-W} state was registered at:
[<
ffffffff810f973f>] __lock_acquire+0x68f/0x1c40
[<
ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
[<
ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
[<
ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch]
[<
ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch]
[<
ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch]
[<
ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch]
[<
ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0
[<
ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0
[<
ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0
[<
ffffffff816c0798>] genl_rcv+0x28/0x40
[<
ffffffff816bf830>] netlink_unicast+0x100/0x1e0
[<
ffffffff816bfc57>] netlink_sendmsg+0x347/0x770
[<
ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0
[<
ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0
[<
ffffffff8166a911>] __sys_sendmsg+0x51/0x90
[<
ffffffff8166a962>] SyS_sendmsg+0x12/0x20
[<
ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b
irq event stamp:
1740726
hardirqs last enabled at (
1740726): [<
ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840
hardirqs last disabled at (
1740725): [<
ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840
softirqs last enabled at (
1740674): [<
ffffffff8109be12>] _local_bh_enable+0x22/0x50
softirqs last disabled at (
1740675): [<
ffffffff8109db05>] irq_exit+0xc5/0xd0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&cpu_stats->lock)->rlock);
<Interrupt>
lock(&(&cpu_stats->lock)->rlock);
*** DEADLOCK ***
5 locks held by swapper/0/0:
#0: (((&ifa->dad_timer))){+.-...}, at: [<
ffffffff810a7155>] call_timer_fn+0x5/0x320
#1: (rcu_read_lock){.+.+..}, at: [<
ffffffff81788a55>] mld_sendpack+0x5/0x4a0
#2: (rcu_read_lock_bh){.+....}, at: [<
ffffffff8175d149>] ip6_finish_output2+0x59/0x840
#3: (rcu_read_lock_bh){.+....}, at: [<
ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0
#4: (rcu_read_lock){.+.+..}, at: [<
ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch]
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G I
3.14.0-rc8-00007-g632b06a #1
Hardware name: /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012
0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c
ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005
ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0
Call Trace:
<IRQ> [<
ffffffff817cfe3c>] dump_stack+0x4d/0x66
[<
ffffffff817cb6da>] print_usage_bug+0x1f4/0x205
[<
ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180
[<
ffffffff810f8963>] mark_lock+0x223/0x2b0
[<
ffffffff810f96d3>] __lock_acquire+0x623/0x1c40
[<
ffffffff810f5707>] ? __lock_is_held+0x57/0x80
[<
ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch]
[<
ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
[<
ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
[<
ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
[<
ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
[<
ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
[<
ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
[<
ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40
[<
ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[<
ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch]
[<
ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch]
[<
ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0
[<
ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0
[<
ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0
[<
ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840
[<
ffffffff8168c430>] dev_queue_xmit+0x10/0x20
[<
ffffffff8175d641>] ip6_finish_output2+0x551/0x840
[<
ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220
[<
ffffffff8176128a>] ip6_finish_output+0x9a/0x220
[<
ffffffff8176145f>] ip6_output+0x4f/0x1f0
[<
ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0
[<
ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0
[<
ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
[<
ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50
[<
ffffffff817690d7>] addrconf_dad_completed+0x147/0x220
[<
ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
[<
ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0
[<
ffffffff810a71e9>] call_timer_fn+0x99/0x320
[<
ffffffff810a7155>] ? call_timer_fn+0x5/0x320
[<
ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
[<
ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0
[<
ffffffff8109d47d>] __do_softirq+0x12d/0x480
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Thu, 27 Mar 2014 12:46:56 +0000 (21:46 +0900)]
bridge: Fix handling stacked vlan tags
If a bridge with vlan_filtering enabled receives frames with stacked
vlan tags, i.e., they have two vlan tags, br_vlan_untag() strips not
only the outer tag but also the inner tag.
br_vlan_untag() is called only from br_handle_vlan(), and in this case,
it is enough to set skb->vlan_tci to 0 here, because vlan_tci has already
been set before calling br_handle_vlan().
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Thu, 27 Mar 2014 12:46:55 +0000 (21:46 +0900)]
bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled
Bridge vlan code (br_vlan_get_tag()) assumes that all frames have vlan_tci
if they are tagged, but if vlan tx offload is manually disabled on bridge
device and frames are sent from vlan device on the bridge device, the tags
are embedded in skb->data and they break this assumption.
Extract embedded vlan tags and move them to vlan_tci at ingress.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Thu, 27 Mar 2014 10:53:37 +0000 (12:53 +0200)]
vhost: validate vhost_get_vq_desc return value
vhost fails to validate negative error code
from vhost_get_vq_desc causing
a crash: we are using -EFAULT which is 0xfffffff2
as vector size, which exceeds the allocated size.
The code in question was introduced in commit
8dd014adfea6f173c1ef6378f7e5e7924866c923
vhost-net: mergeable buffers support
CVE-2014-0055
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Thu, 27 Mar 2014 10:00:26 +0000 (12:00 +0200)]
vhost: fix total length when packets are too short
When mergeable buffers are disabled, and the
incoming packet is too large for the rx buffer,
get_rx_bufs returns success.
This was intentional in order for make recvmsg
truncate the packet and then handle_rx would
detect err != sock_len and drop it.
Unfortunately we pass the original sock_len to
recvmsg - which means we use parts of iov not fully
validated.
Fix this up by detecting this overrun and doing packet drop
immediately.
CVE-2014-0077
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sasha Levin [Fri, 28 Mar 2014 16:38:42 +0000 (17:38 +0100)]
random32: avoid attempt to late reseed if in the middle of seeding
Commit
4af712e8df ("random32: add prandom_reseed_late() and call when
nonblocking pool becomes initialized") has added a late reseed stage
that happens as soon as the nonblocking pool is marked as initialized.
This fails in the case that the nonblocking pool gets initialized
during __prandom_reseed()'s call to get_random_bytes(). In that case
we'd double back into __prandom_reseed() in an attempt to do a late
reseed - deadlocking on 'lock' early on in the boot process.
Instead, just avoid even waiting to do a reseed if a reseed is already
occuring.
Fixes: 4af712e8df99 ("random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 28 Mar 2014 20:03:00 +0000 (13:03 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
"Updates to Synaptics touchpad to better cope with devices in Lenovo
laptops, and a couple more fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics - add manual min/max quirk for ThinkPad X240
Input: synaptics - add manual min/max quirk
Input: cypress_ps2 - don't report as a button pads
Input: da9052_onkey - use correct register bit for key status
Input: adp5588-keys - get value from data out when dir is out
Sasha Levin [Thu, 27 Mar 2014 06:01:34 +0000 (02:01 -0400)]
random32: assign to network folks in MAINTAINERS
lib/random32.c was split out of the network code and is de-facto
still maintained by the almighty net/ gods.
Make it a bit more official so that people who aren't aware of
that know where to send their patches.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 28 Mar 2014 17:58:10 +0000 (10:58 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"I didn't want these to wait for stable cycle.
The nouveau and radeon ones are the same problem, where the runtime pm
stuff broke non-runtime pm managed secondary GPUs.
The udl fix is for an oops on unplug, and the i915 fix is for a
regression on Sandybridge even though it may break haswell (regression
wins)"
Daniel Vetter comments:
"My apologies for the i915 regression fumble, that thing somehow fell
through the cracks here for almost half a year :( Imo that's more than
enough flailing to just go ahead with the revert, and the re-broken
hsw should get peoples attention ..."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Undo gtt scratch pte unmapping again
drm/radeon: fix runtime suspend breaking secondary GPUs
drm/nouveau: fail runtime pm properly.
drm/udl: take reference to device struct for dma-bufs
Linus Torvalds [Fri, 28 Mar 2014 17:55:44 +0000 (10:55 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c build fix from Wolfram Sang:
"The build fix from my last request unveiled another build problem
which is fixed with this patch"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: cpm: Fix build by adding of_address.h and of_irq.h
Linus Torvalds [Fri, 28 Mar 2014 17:52:05 +0000 (10:52 -0700)]
Merge tag 'stable/for-linus-3.14-rc8-tag' of git://git./linux/kernel/git/xen/tip
Pull Xen bugfixes from David Vrabel:
"Fix two bugs that cause x86 PV guest crashes.
1. Ballooning a 32-bit guest would eventually crash it.
2. Revert a broken fix for a regression with NUMA_BALACING. The bad
fix caused PV guests to crash after migration. This is not ideal
but unpicking the madness that is _PAGE_NUMA == _PAGE_PROTNONE will
take a while longer"
* tag 'stable/for-linus-3.14-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
Revert "xen: properly account for _PAGE_NUMA during xen pte translations"
xen/balloon: flush persistent kmaps in correct position
Hans de Goede [Fri, 28 Mar 2014 08:01:38 +0000 (01:01 -0700)]
Input: synaptics - add manual min/max quirk for ThinkPad X240
This extends Benjamin Tissoires manual min/max quirk table with support for
the ThinkPad X240.
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Benjamin Tissoires [Fri, 28 Mar 2014 07:43:00 +0000 (00:43 -0700)]
Input: synaptics - add manual min/max quirk
The new Lenovo Haswell series (-40's) contains a new Synaptics touchpad.
However, these new Synaptics devices report bad axis ranges.
Under Windows, it is not a problem because the Windows driver uses RMI4
over SMBus to talk to the device. Under Linux, we are using the PS/2
fallback interface and it occurs the reported ranges are wrong.
Of course, it would be too easy to have only one range for the whole
series, each touchpad seems to be calibrated in a different way.
We can not use SMBus to get the actual range because I suspect the firmware
will switch into the SMBus mode and stop talking through PS/2 (this is the
case for hybrid HID over I2C / PS/2 Synaptics touchpads).
So as a temporary solution (until RMI4 land into upstream), start a new
list of quirks with the min/max manually set.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Daniel Vetter [Wed, 26 Mar 2014 19:10:09 +0000 (20:10 +0100)]
drm/i915: Undo gtt scratch pte unmapping again
It apparently blows up on some machines. This functionally reverts
commit
828c79087cec61eaf4c76bb32c222fbe35ac3930
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Wed Oct 16 09:21:30 2013 -0700
drm/i915: Disable GGTT PTEs on GEN6+ suspend
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64841
Reported-and-Tested-by: Brad Jackson <bjackson0971@gmail.com>
Cc: stable@vger.kernel.org
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Todd Previte <tprevite@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 27 Mar 2014 02:31:08 +0000 (02:31 +0000)]
drm/radeon: fix runtime suspend breaking secondary GPUs
Same fix as for nouveau, when we fail with EINVAL, subsequent
gets fail hard, causing the device not to open.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Wei Yang [Thu, 27 Mar 2014 01:28:31 +0000 (09:28 +0800)]
net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset
The second parameter of __mlx4_init_one() is used to identify whether the
pci_dev is a PF or VF. Currently, when it is invoked in mlx4_pci_slot_reset()
this information is missed.
This patch match the pci_dev with mlx4_pci_table and passes the
pci_device_id.driver_data to __mlx4_init_one() in mlx4_pci_slot_reset().
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zoltan Kiss [Wed, 26 Mar 2014 22:37:45 +0000 (22:37 +0000)]
core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 26 Mar 2014 15:47:56 +0000 (11:47 -0400)]
vlan: Set hard_header_len according to available acceleration
Currently, if the card supports CTAG acceleration we do not
account for the vlan header even if we are configuring an
8021AD vlan. This may not be best since we'll do software
tagging for 8021AD which will cause data copy on skb head expansion
Configure the length based on available hw offload capabilities and
vlan protocol.
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Wed, 26 Mar 2014 13:32:51 +0000 (14:32 +0100)]
usbnet: include wait queue head in device structure
This fixes a race which happens by freeing an object on the stack.
Quoting Julius:
> The issue is
> that it calls usbnet_terminate_urbs() before that, which temporarily
> installs a waitqueue in dev->wait in order to be able to wait on the
> tasklet to run and finish up some queues. The waiting itself looks
> okay, but the access to 'dev->wait' is totally unprotected and can
> race arbitrarily. I think in this case usbnet_bh() managed to succeed
> it's dev->wait check just before usbnet_terminate_urbs() sets it back
> to NULL. The latter then finishes and the waitqueue_t structure on its
> stack gets overwritten by other functions halfway through the
> wake_up() call in usbnet_bh().
The fix is to just not allocate the data structure on the stack.
As dev->wait is abused as a flag it also takes a runtime PM change
to fix this bug.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Reported-by: Grant Grundler <grundler@google.com>
Tested-by: Grant Grundler <grundler@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Wed, 26 Mar 2014 05:03:00 +0000 (13:03 +0800)]
virtio-net: correct error handling of virtqueue_kick()
Current error handling of virtqueue_kick() was wrong in two places:
- The skb were freed immediately when virtqueue_kick() fail during
xmit. This may lead double free since the skb was not detached from
the virtqueue.
- try_fill_recv() returns false when virtqueue_kick() fail. This will
lead unnecessary rescheduling of refill work.
Actually, it's safe to just ignore the kick failure in those two
places. So this patch fixes this by partially revert commit
67975901183799af8e93ec60e322f9e2a1940b9b.
Fixes
67975901183799af8e93ec60e322f9e2a1940b9b
(virtio_net: verify if virtqueue_kick() succeeded).
Cc: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Kara [Wed, 26 Mar 2014 05:20:14 +0000 (06:20 +0100)]
vfs: Allocate anon_inode_inode in anon_inode_init()
Currently we allocated anon_inode_inode in anon_inodefs_mount. This is
somewhat fragile as if that function ever gets called again, it will
overwrite anon_inode_inode pointer. So move the initialization of
anon_inode_inode to anon_inode_init().
Signed-off-by: Jan Kara <jack@suse.cz>
[ Further simplified on suggestion from Dave Jones ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Airlie [Wed, 26 Mar 2014 04:09:37 +0000 (14:09 +1000)]
drm/nouveau: fail runtime pm properly.
If we were on a non-optimus device, we'd return -EINVAL, this would
lead to the over engineered runtime pm system to go into an error
state, subsequent get_sync's would fail, so we'd never be able
to open the device again.
(like really get_sync shouldn't fail if the device isn't powered
down).
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Dave Airlie [Tue, 25 Mar 2014 04:38:44 +0000 (14:38 +1000)]
drm/udl: take reference to device struct for dma-bufs
this stops the device from being deleted before all the dma-bufs
on it are freed, this fixes an oops when you unplug a udl device while
it has imported a buffer from another device.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Eric Dumazet [Wed, 26 Mar 2014 01:42:27 +0000 (18:42 -0700)]
net: unix: non blocking recvmsg() should not return -EINTR
Some applications didn't expect recvmsg() on a non blocking socket
could return -EINTR. This possibility was added as a side effect
of commit
b3ca9b02b00704 ("net: fix multithreaded signal handling in
unix recv routines").
To hit this bug, you need to be a bit unlucky, as the u->readlock
mutex is usually held for very small periods.
Fixes: b3ca9b02b00704 ("net: fix multithreaded signal handling in unix recv routines")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Mar 2014 20:53:04 +0000 (16:53 -0400)]
Merge branch 'mvneta'
Thomas Petazzoni says:
====================
net: mvneta: fix usage as a module
The following set of two patches fix the usage of the mvneta driver
when built as a module, and used in RGMII configurations. It is
somewhat similar to a previous fix that was made by Arnaud Patard, but
which was limited to SGMII configurations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Tue, 25 Mar 2014 23:26:55 +0000 (00:26 +0100)]
net: mvneta: use devm_ioremap_resource() instead of of_iomap()
The mvneta driver currently uses of_iomap(), which has two drawbacks:
it doesn't request the resource, and it isn't devm-style so some error
handling is needed.
This commit switches to use devm_ioremap_resource() instead, which
automatically requests the resource (so the I/O registers region shows
up properly in /proc/iomem), and also is devm-style, which allows to
get rid of some error handling to unmap the I/O registers region.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Tue, 25 Mar 2014 23:25:42 +0000 (00:25 +0100)]
net: mvneta: fix usage as a module on RGMII configurations
Commit
5445eaf309ff ('mvneta: Try to fix mvneta when compiled as
module') fixed the mvneta driver to make it work properly when loaded
as a module in SGMII configuration, which was tested successful by the
author on the Armada XP OpenBlocks AX3, which uses SGMII.
However, it turns out that the Armada XP GP, which uses RGMII, is
affected by a similar problem: its SERDES configuration is lost when
mvneta is loaded as a module, because this configuration is set by the
bootloader, and then lost because the clock is gated by the clock
framework until the mvneta driver is loaded again and the clock is
re-enabled.
However, it turns out that for the RGMII case, setting the SERDES
configuration is not sufficient: the PCS enable bit in the
MVNETA_GMAC_CTRL_2 register must also be set, like in the SGMII
configuration.
Therefore, this commit reworks the SGMII/RGMII initialization: the
only difference between the two now is a different SERDES
configuration, all the rest is identical.
In detail, to achieve this, the commit:
* Renames MVNETA_SGMII_SERDES_CFG to MVNETA_SERDES_CFG because it is
not specific to SGMII, but also used on RGMII configurations.
* Adds a MVNETA_RGMII_SERDES_PROTO definition, that must be used as
the MVNETA_SERDES_CFG value in RGMII configurations.
* Removes the mvneta_gmac_rgmii_set() and mvneta_port_sgmii_config()
functions, and instead directly do the SGMII/RGMII configuration in
mvneta_port_up(), from where those functions where called. It is
worth mentioning that mvneta_gmac_rgmii_set() had an 'enable'
parameter that was always passed as '1', so it was pretty useless.
* Reworks the mvneta_port_up() function to set the MVNETA_SERDES_CFG
register to the appropriate value depending on the RGMII vs. SGMII
configuration. It also unconditionally set the PCS_ENABLE bit (was
already done for SGMII, but is now also needed for RGMII), and sets
the PORT_RGMII bit (which was already done for both SGMII and
RGMII).
This commit was successfully tested with mvneta compiled as a module,
on both the OpenBlocks AX3 (SGMII configuration) and the Armada XP GP
(RGMII configuration).
Reported-by: Steve McIntyre <steve@einval.com>
Cc: stable@vger.kernel.org # 3.11.x: 5445eaf309ff mvneta: Try to fix mvneta when compiled as module
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Tue, 25 Mar 2014 23:25:41 +0000 (00:25 +0100)]
net: mvneta: rename MVNETA_GMAC2_PSC_ENABLE to MVNETA_GMAC2_PCS_ENABLE
Bit 3 of the MVNETA_GMAC_CTRL_2 is actually used to enable the PCS,
not the PSC: there was a typo in the name of the define, which this
commit fixes.
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hans de Goede [Wed, 26 Mar 2014 20:30:52 +0000 (13:30 -0700)]
Input: cypress_ps2 - don't report as a button pads
The cypress PS/2 trackpad models supported by the cypress_ps2 driver
emulate BTN_RIGHT events in firmware based on the finger position, as part
of this no motion events are sent when the finger is in the button area.
The INPUT_PROP_BUTTONPAD property is there to indicate to userspace that
BTN_RIGHT events should be emulated in userspace, which is not necessary
in this case.
When INPUT_PROP_BUTTONPAD is advertised userspace will wait for a motion
event before propagating the button event higher up the stack, as it needs
current abs x + y data for its BTN_RIGHT emulation. Since in the
cypress_ps2 pads don't report motion events in the button area, this means
that clicks in the button area end up being ignored, so
INPUT_PROP_BUTTONPAD actually causes problems for these touchpads, and
removing it fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=76341
Reported-by: Adam Williamson <awilliam@redhat.com>
Tested-by: Adam Williamson <awilliam@redhat.com>
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Vlad Yasevich [Mon, 24 Mar 2014 21:52:12 +0000 (17:52 -0400)]
tg3: Do not include vlan acceleration features in vlan_features
Including hardware acceleration features in vlan_features breaks
stacked vlans (Q-in-Q) by marking the bottom vlan interface as
capable of acceleration. This causes one of the tags to be lost
and the packets are sent with a sing vlan header.
CC: Nithin Nayak Sujir <nsujir@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Mon, 24 Mar 2014 05:06:36 +0000 (22:06 -0700)]
ip_tunnel: Fix dst ref-count.
Commit
10ddceb22ba (ip_tunnel:multicast process cause panic due
to skb->_skb_refdst NULL pointer) removed dst-drop call from
ip-tunnel-recv.
Following commit reintroduce dst-drop and fix the original bug by
checking loopback packet before releasing dst.
Original bug: https://bugzilla.kernel.org/show_bug.cgi?id=70681
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 26 Mar 2014 16:09:18 +0000 (09:09 -0700)]
Merge tag 'trace-fixes-v3.14-rc7-v2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"While on my flight to Linux Collaboration Summit, I was working on my
slides for the event trigger tutorial. I booted a 3.14-rc7 kernel to
perform what I wanted to teach and cut and paste it into my slides.
When I tried the traceon event trigger with a condition attached to it
(turns tracing on only if a field of the trigger event matches a
condition set by the user), nothing happened. Tracing would not turn
on. I stopped working on my presentation in order to find what was
wrong.
It ended up being the way trace event triggers work when they have
conditions. Instead of copying the fields, the condition code just
looks at the fields that were copied into the ring buffer. This works
great, unless tracing is off. That's because when the event is
reserved on the ring buffer, the ring buffer returns a NULL pointer,
this tells the tracing code that the ring buffer is disabled. This
ends up being a problem for the traceon trigger if it is using this
information to check its condition.
Luckily the code that checks if tracing is on returns the ring buffer
to use (because the ring buffer is determined by the event file also
passed to that field). I was able to easily solve this bug by
checking in that helper function if the returned ring buffer entry is
NULL, and if so, also check the file flag if it has a trace event
trigger condition, and if so, to pass back a temp ring buffer to use.
This will allow the trace event trigger condition to still test the
event fields, but nothing will be recorded"
* tag 'trace-fixes-v3.14-rc7-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix traceon trigger condition to actually turn tracing on
Steven Rostedt (Red Hat) [Wed, 26 Mar 2014 03:39:41 +0000 (23:39 -0400)]
tracing: Fix traceon trigger condition to actually turn tracing on
While working on my tutorial for 2014 Linux Collaboration Summit
I found that the traceon trigger did not work when conditions were
used. The other triggers worked fine though. Looking into it, it
is because of the way the triggers use the ring buffer to store
the fields it will use for the condition. But if tracing is off, nothing
is stored in the buffer, and the tracepoint exits before calling the
trigger to test the condition. This is fine for all the triggers that
only work when tracing is on, but for traceon trigger that is to
work when tracing is off, nothing happens.
The fix is simple, just use a temp ring buffer to record the event
if tracing is off and the event has a trace event conditional trigger
enabled. The rest of the tracepoint code will work just fine, but
the tracepoint wont be recorded in the other buffers.
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Linus Torvalds [Wed, 26 Mar 2014 00:43:34 +0000 (17:43 -0700)]
fs: remove now stale label in anon_inode_init()
The previous commit removed the register_filesystem() call and the
associated error handling, but left the label for the error path that no
longer exists. Remove that too.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Tue, 25 Mar 2014 20:37:09 +0000 (21:37 +0100)]
fs: Avoid userspace mounting anon_inodefs filesystem
anon_inodefs filesystem is a kernel internal filesystem userspace
shouldn't mess with. Remove registration of it so userspace cannot
even try to mount it (which would fail anyway because the filesystem is
MS_NOUSER).
This fixes an oops triggered by trinity when it tried mounting
anon_inodefs which overwrote anon_inode_inode pointer while other CPU
has been in anon_inode_getfile() between ihold() and d_instantiate().
Thus effectively creating dentry pointing to an inode without holding a
reference to it.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 25 Mar 2014 22:24:11 +0000 (15:24 -0700)]
Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux
Pull nfsd fix frm Bruce Fields:
"J R Okajima sent this early and I was just slow to pass it along,
apologies. Fortunately it's a simple fix"
* 'nfsd-next' of git://linux-nfs.org/~bfields/linux:
nfsd: fix lost nfserrno() call in nfsd_setattr()
Linus Torvalds [Tue, 25 Mar 2014 22:05:57 +0000 (15:05 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"These four commits are obvious fixes (a couple of fdget_pos()-related
ones from Eric Biggers, prepend_name() fix, missing checks for false
negatives from __lookup_mnt() in fs/namei.c)"
For now I'm pulling just the four obvious fixes, there's another four
pending in Al's 'for-linus' branch wrt the mnt_hash list that were more
involved.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
rcuwalk: recheck mount_lock after mountpoint crossing attempts
make prepend_name() work correctly when called with negative *buflen
vfs: Don't let __fdget_pos() get FMODE_PATH files
vfs: atomic f_pos access in llseek()
David Vrabel [Tue, 25 Mar 2014 10:38:37 +0000 (10:38 +0000)]
Revert "xen: properly account for _PAGE_NUMA during xen pte translations"
This reverts commit
a9c8e4beeeb64c22b84c803747487857fe424b68.
PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
is set and pseudo-physical addresses is _PAGE_PRESENT is clear.
This is because during a domain save/restore (migration) the page
table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
converted to PFNs during domain save so that on a restore the page
table entries may be rewritten with the new MFNs on the destination.
This canonicalisation is only done for PTEs that are present.
This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or
_PAGE_NUMA) was set but _PAGE_PRESENT was clear. These PTEs would be
migrated as-is which would result in unexpected behaviour in the
destination domain. Either a) the MFN would be translated to the
wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE
because the MFN is no longer owned by the domain; or c) the present
bit would not get set.
Symptoms include "Bad page" reports when munmapping after migrating a
domain.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org> [3.12+]
Wei Liu [Sat, 15 Mar 2014 16:11:47 +0000 (16:11 +0000)]
xen/balloon: flush persistent kmaps in correct position
Xen balloon driver will update ballooned out pages' P2M entries to point
to scratch page for PV guests. In
24f69373e2 ("xen/balloon: don't alloc
page while non-preemptible", kmap_flush_unused was moved after updating
P2M table. In that case for 32 bit PV guest we might end up with
P2M X -----> S (S is mfn of balloon scratch page)
M2P Y -----> X (Y is mfn in persistent kmap entry)
kmap_flush_unused() iterates through all the PTEs in the kmap address
space, using pte_to_page() to obtain the page. If the p2m and the m2p
are inconsistent the incorrect page is returned. This will clear
page->address on the wrong page which may cause subsequent oopses if
that page is currently kmap'ed.
Move the flush back between get_page and __set_phys_to_machine to fix
this.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: stable@vger.kernel.org # 3.12+
Linus Torvalds [Tue, 25 Mar 2014 02:31:17 +0000 (19:31 -0700)]
Linux 3.14-rc8
Linus Torvalds [Tue, 25 Mar 2014 00:36:58 +0000 (17:36 -0700)]
Merge branch 'parisc-3.14' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
- revert parts of the latest patch regarding font selection with STICON
console
- wire up the utimes() syscall for parisc
- remove the unused parisc tmpalias code and unnecessary arch*relax
defines
* 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: locks: remove redundant arch_*_relax operations
parisc: wire up sys_utimes
parisc: Remove unused CONFIG_PARISC_TMPALIAS code
partly revert commit
8a10bc9: parisc/sti_console: prefer Linux fonts over built-in ROM fonts
Linus Torvalds [Tue, 25 Mar 2014 00:30:44 +0000 (17:30 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
1) Do serial locking in a way that makes things clear that these are
IRQ spinlocks.
2) Conversion to generic idle loop broke first generation Niagara
machines, need to have %pil interrupts enabled during cpu yield
hypervisor call.
3) Do not use magic constants for iterations over tsb tables, from Doug
Wilson.
4) Fix erroneous truncation of 64-bit system call return values to
32-bit. From Dave Kleikamp.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Make sure %pil interrupts are enabled during hypervisor yield.
sparc64:tsb.c:use array size macro rather than number
sparc64: don't treat 64-bit syscall return codes as 32-bit
sparc: serial: Clean up the locking for -rt
Linus Torvalds [Tue, 25 Mar 2014 00:07:24 +0000 (17:07 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) OpenVswitch's lookup_datapath() returns error pointers, so don't
check against NULL. From Jiri Pirko.
2) pfkey_compile_policy() code path tries to do a GFP_KERNEL allocation
under RCU locks, fix by using GFP_ATOMIC when necessary. From
Nikolay Aleksandrov.
3) phy_suspend() indirectly passes uninitialized data into the ethtool
get wake-on-land implementations. Fix from Sebastian Hesselbarth.
4) CPSW driver unregisters CPTS twice, fix from Benedikt Spranger.
5) If SKB allocation of reply packet fails, vxlan's arp_reduce() defers
a NULL pointer. Fix from David Stevens.
6) IPV6 neigh handling in vxlan doesn't validate the destination
address properly, and it builds a packet with the src and dst
reversed. Fix also from David Stevens.
7) Fix spinlock recursion during subscription failures in TIPC stack,
from Erik Hugne.
8) Revert buggy conversion of davinci_emac to devm_request_irq, from
Chrstian Riesch.
9) Wrong flags passed into forwarding database netlink notifications,
from Nicolas Dichtel.
10) The netpoll neighbour soliciation handler checks wrong ethertype,
needs to be ETH_P_IPV6 rather than ETH_P_ARP. Fix from Li RongQing.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
tipc: fix spinlock recursion bug for failed subscriptions
vxlan: fix nonfunctional neigh_reduce()
net: davinci_emac: Fix rollback of emac_dev_open()
net: davinci_emac: Replace devm_request_irq with request_irq
netpoll: fix the skb check in pkt_is_ns
net: micrel : ks8851-ml: add vdd-supply support
ip6mr: fix mfc notification flags
ipmr: fix mfc notification flags
rtnetlink: fix fdb notification flags
tcp: syncookies: do not use getnstimeofday()
netlink: fix setsockopt in mmap examples in documentation
openvswitch: Correctly report flow used times for first 5 minutes after boot.
via-rhine: Disable device in error path
ATHEROS-ATL1E: Convert iounmap to pci_iounmap
vxlan: fix potential NULL dereference in arp_reduce()
cnic: Update version to 2.5.20 and copyright year.
cnic,bnx2i,bnx2fc: Fix inconsistent use of page size
cnic: Use proper ulp_ops for per device operations.
net: cdc_ncm: fix control message ordering
ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly
...
Erik Hugne [Mon, 24 Mar 2014 15:56:38 +0000 (16:56 +0100)]
tipc: fix spinlock recursion bug for failed subscriptions
If a topology event subscription fails for any reason, such as out
of memory, max number reached or because we received an invalid
request the correct behavior is to terminate the subscribers
connection to the topology server. This is currently broken and
produces the following oops:
[27.953662] tipc: Subscription rejected, illegal request
[27.955329] BUG: spinlock recursion on CPU#1, kworker/u4:0/6
[27.957066] lock: 0xffff88003c67f408, .magic:
dead4ead, .owner: kworker/u4:0/6, .owner_cpu: 1
[27.958054] CPU: 1 PID: 6 Comm: kworker/u4:0 Not tainted 3.14.0-rc6+ #5
[27.960230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[27.960874] Workqueue: tipc_rcv tipc_recv_work [tipc]
[27.961430]
ffff88003c67f408 ffff88003de27c18 ffffffff815c0207 ffff88003de1c050
[27.962292]
ffff88003de27c38 ffffffff815beec5 ffff88003c67f408 ffffffff817f0a8a
[27.963152]
ffff88003de27c58 ffffffff815beeeb ffff88003c67f408 ffffffffa0013520
[27.964023] Call Trace:
[27.964292] [<
ffffffff815c0207>] dump_stack+0x45/0x56
[27.964874] [<
ffffffff815beec5>] spin_dump+0x8c/0x91
[27.965420] [<
ffffffff815beeeb>] spin_bug+0x21/0x26
[27.965995] [<
ffffffff81083df6>] do_raw_spin_lock+0x116/0x140
[27.966631] [<
ffffffff815c6215>] _raw_spin_lock_bh+0x15/0x20
[27.967256] [<
ffffffffa0008540>] subscr_conn_shutdown_event+0x20/0xa0 [tipc]
[27.968051] [<
ffffffffa000fde4>] tipc_close_conn+0xa4/0xb0 [tipc]
[27.968722] [<
ffffffffa00101ba>] tipc_conn_terminate+0x1a/0x30 [tipc]
[27.969436] [<
ffffffffa00089a2>] subscr_conn_msg_event+0x1f2/0x2f0 [tipc]
[27.970209] [<
ffffffffa0010000>] tipc_receive_from_sock+0x90/0xf0 [tipc]
[27.970972] [<
ffffffffa000fa79>] tipc_recv_work+0x29/0x50 [tipc]
[27.971633] [<
ffffffff8105dbf5>] process_one_work+0x165/0x3e0
[27.972267] [<
ffffffff8105e869>] worker_thread+0x119/0x3a0
[27.972896] [<
ffffffff8105e750>] ? manage_workers.isra.25+0x2a0/0x2a0
[27.973622] [<
ffffffff810648af>] kthread+0xdf/0x100
[27.974168] [<
ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0
[27.974893] [<
ffffffff815ce13c>] ret_from_fork+0x7c/0xb0
[27.975466] [<
ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0
The recursion occurs when subscr_terminate tries to grab the
subscriber lock, which is already taken by subscr_conn_msg_event.
We fix this by checking if the request to establish a new
subscription was successful, and if not we initiate termination of
the subscriber after we have released the subscriber lock.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Stevens [Mon, 24 Mar 2014 14:39:58 +0000 (10:39 -0400)]
vxlan: fix nonfunctional neigh_reduce()
The VXLAN neigh_reduce() code is completely non-functional since
check-in. Specific errors:
1) The original code drops all packets with a multicast destination address,
even though neighbor solicitations are sent to the solicited-node
address, a multicast address. The code after this check was never run.
2) The neighbor table lookup used the IPv6 header destination, which is the
solicited node address, rather than the target address from the
neighbor solicitation. So neighbor lookups would always fail if it
got this far. Also for L3MISSes.
3) The code calls ndisc_send_na(), which does a send on the tunnel device.
The context for neigh_reduce() is the transmit path, vxlan_xmit(),
where the host or a bridge-attached neighbor is trying to transmit
a neighbor solicitation. To respond to it, the tunnel endpoint needs
to do a *receive* of the appropriate neighbor advertisement. Doing a
send, would only try to send the advertisement, encapsulated, to the
remote destinations in the fdb -- hosts that definitely did not do the
corresponding solicitation.
4) The code uses the tunnel endpoint IPv6 forwarding flag to determine the
isrouter flag in the advertisement. This has nothing to do with whether
or not the target is a router, and generally won't be set since the
tunnel endpoint is bridging, not routing, traffic.
The patch below creates a proxy neighbor advertisement to respond to
neighbor solicitions as intended, providing proper IPv6 support for neighbor
reduction.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Mar 2014 19:32:34 +0000 (15:32 -0400)]
Merge branch 'davinci_emac'
Christian Riesch says:
====================
net: davinci_emac: Fix interrupt requests and error handling
since commit
6892b41d9701283085b655c6086fb57a5d63fa47 (Linux 3.11) the
davinci_emac driver is broken. After doing ifconfig down, ifconfig up,
requesting the interrupts for the driver fails. The interface remains dead
until the board is rebooted.
The first patch in this patchset reverts commit
6892b41d9701283085b655c6086fb57a5d63fa47 partially and makes the driver
useable again.
During the work on the first patch, a number of bugs in the error handling
of the driver's ndo_open code were found. The second patch fixes these bugs.
I believe the first patch meets the rules for stable kernels, I therefore added
the stable tag to this patch. The second patch is just cleanup, the code
that is fixed by this patch is only executed in case of an error.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Christian Riesch [Mon, 24 Mar 2014 12:46:27 +0000 (13:46 +0100)]
net: davinci_emac: Fix rollback of emac_dev_open()
If an error occurs during the initialization in emac_dev_open() (the
driver's ndo_open function), interrupts, DMA descriptors etc. must be freed.
The current rollback code is buggy in several ways.
1) Freeing the interrupts. The current code will not free all interrupts
that were requested by the driver. Furthermore, the code tries to do a
platform_get_resource(priv->pdev, IORESOURCE_IRQ, -1) in its last
iteration.
This patch fixes these bugs.
2) Wrong order of err: and rollback: labels. If the setup of the PHY in
the code fails, the interrupts that have been requested before are
not freed:
request irq
if requesting irqs fails, goto rollback
setup phy
if phy setup fails, goto err
return 0
rollback:
free irqs
err:
This patch brings the code into the correct order.
3) The code calls napi_enable() and emac_int_enable(), but does not
undo both in case of an error.
This patch adds calls of emac_int_disable() and napi_disable() to the
rollback code.
4) RX DMA descriptors are not freed in case of an error: Right before
requesting the irqs, the function creates DMA descriptors for the
RX channel. These RX descriptors are never freed when we jump to either
rollback or err.
This patch adds code for freeing the DMA descriptors in the case of
an initialization error. This required a modification of
cpdma_ctrl_stop() in davinci_cpdma.c: We must be able to call this
function to free the DMA descriptors while the DMA channels are
in IDLE state (before cpdma_ctlr_start() was called).
Tested on a custom board with the Texas Instruments AM1808.
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christian Riesch [Mon, 24 Mar 2014 12:46:26 +0000 (13:46 +0100)]
net: davinci_emac: Replace devm_request_irq with request_irq
In commit
6892b41d9701283085b655c6086fb57a5d63fa47
Author: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Date: Tue Jun 25 21:24:51 2013 +0530
net: davinci: emac: Convert to devm_* api
the call of request_irq is replaced by devm_request_irq and the call
of free_irq is removed. But since interrupts are requested in
emac_dev_open, doing ifconfig up/down on the board requests the
interrupts again each time, causing devm_request_irq to fail. The
interface is dead until the device is rebooted.
This patch reverts said commit partially: It changes the driver back
to use request_irq instead of devm_request_irq, puts free_irq back in
place, but keeps the remaining changes of the original patch.
Reported-by: Jon Ringle <jon@ringle.org>
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Cc: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Fri, 21 Mar 2014 12:53:57 +0000 (20:53 +0800)]
netpoll: fix the skb check in pkt_is_ns
Neighbor Solicitation is ipv6 protocol, so we should check
skb->protocol with ETH_P_IPV6
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Cc: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Mar 2014 18:45:12 +0000 (14:45 -0400)]
sparc64: Make sure %pil interrupts are enabled during hypervisor yield.
In arch_cpu_idle() we must enable %pil based interrupts before
potentially invoking the hypervisor cpu yield call.
As per the Hypervisor API documentation for cpu_yield:
Interrupts which are blocked by some mechanism other that
pstate.ie (for example %pil) are not guaranteed to cause
a return from this service.
It seems that only first generation Niagara chips are hit by this
bug. My best guess is that later chips implement this in hardware
and wake up anyways from %pil events, whereas in first generation
chips the yield is implemented completely in hypervisor code and
requires %pil to be enabled in order to wake properly from this
call.
Fixes: 87fa05aeb3a5 ("sparc: Use generic idle loop")
Reported-by: Fabio M. Di Nitto <fabbione@fabbione.net>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Wood [Tue, 18 Mar 2014 21:10:24 +0000 (16:10 -0500)]
i2c: cpm: Fix build by adding of_address.h and of_irq.h
Fixes a build break due to the undeclared use of irq_of_parse_and_map()
and of_iomap(). This build break was apparently introduced while the
driver was unbuildable due to the bug fixed by
62c19c9d29e65086e5ae76df371ed2e6b23f00cd ("i2c: Remove usage of
orphaned symbol OF_I2C"). When 62c19c was added in v3.14-rc7,
the driver was enabled again, breaking the powerpc mpc85xx_defconfig
and mpc85xx_smp_defconfig.
62c19c is marked for stable, so this should go there as well.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Nishanth Menon [Fri, 21 Mar 2014 06:52:48 +0000 (01:52 -0500)]
net: micrel : ks8851-ml: add vdd-supply support
Few platforms use external regulator to keep the ethernet MAC supplied.
So, request and enable the regulator for driver functionality.
Fixes: 66fda75f47dc (regulator: core: Replace direct ops->disable usage)
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Suggested-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Will Deacon [Fri, 21 Feb 2014 17:34:57 +0000 (17:34 +0000)]
parisc: locks: remove redundant arch_*_relax operations
Now that the arch_{spin,read,write}_relax macros default to cpu_relax(),
remove the redundant definitions for parisc.
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller [Sun, 23 Mar 2014 14:24:40 +0000 (15:24 +0100)]
parisc: wire up sys_utimes
We seem to be nearly the only platform which does not provide the
sys_utimes syscall. Adding it now makes our life much easier with
userspace applications (like dietlibc and e2fsprogs) since we then
behave like all other platforms too and don't need extra patches which
are hard to get upstream anyway because we are not a mainstream
architecture.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v3.13
John David Anglin [Sat, 1 Mar 2014 22:41:22 +0000 (17:41 -0500)]
parisc: Remove unused CONFIG_PARISC_TMPALIAS code
The attached change removes the unused and experimental
CONFIG_PARISC_TMPALIAS code. It doesn't work and I don't believe it will
ever be used.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller [Sun, 23 Mar 2014 15:39:52 +0000 (16:39 +0100)]
partly revert commit
8a10bc9: parisc/sti_console: prefer Linux fonts over built-in ROM fonts
STI console is used on parisc and m68k HP machines. This patch partly reverts
my previous commit and as such restores the fonts for the m68k machines.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v3.13
Al Viro [Thu, 20 Mar 2014 19:18:22 +0000 (15:18 -0400)]
rcuwalk: recheck mount_lock after mountpoint crossing attempts
We can get false negative from __lookup_mnt() if an unrelated vfsmount
gets moved. In that case legitimize_mnt() is guaranteed to fail,
and we will fall back to non-RCU walk... unless we end up running
into a hard error on a filesystem object we wouldn't have reached
if not for that false negative. IOW, delaying that check until
the end of pathname resolution is wrong - we should recheck right
after we attempt to cross the mountpoint. We don't need to recheck
unless we see d_mountpoint() being true - in that case even if
we have just raced with mount/umount, we can simply go on as if
we'd come at the moment when the sucker wasn't a mountpoint; if we
run into a hard error as the result, it was a legitimate outcome.
__lookup_mnt() returning NULL is different in that respect, since
it might've happened due to operation on completely unrelated
mountpoint.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 23 Mar 2014 04:28:40 +0000 (00:28 -0400)]
make prepend_name() work correctly when called with negative *buflen
In all callchains leading to prepend_name(), the value left in *buflen
is eventually discarded unused if prepend_name() has returned a negative.
So we are free to do what prepend() does, and subtract from *buflen
*before* checking for underflow (which turns into checking the sign
of subtraction result, of course).
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Eric Biggers [Sun, 16 Mar 2014 20:47:48 +0000 (15:47 -0500)]
vfs: Don't let __fdget_pos() get FMODE_PATH files
Commit
bd2a31d522344 ("get rid of fget_light()") introduced the
__fdget_pos() function, which returns the resulting file pointer and
fdput flags combined in an 'unsigned long'. However, it also changed the
behavior to return files with FMODE_PATH set, which shouldn't happen
because read(), write(), lseek(), etc. aren't allowed on such files.
This commit restores the old behavior.
This regression actually had no effect on read() and write() since
FMODE_READ and FMODE_WRITE are not set on file descriptors opened with
O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH
to fail with ESPIPE rather than EBADF.
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Eric Biggers [Sun, 16 Mar 2014 19:24:08 +0000 (14:24 -0500)]
vfs: atomic f_pos access in llseek()
Commit
9c225f2655e36a4 ("vfs: atomic f_pos accesses as per POSIX") changed
several system calls to use fdget_pos() instead of fdget(), but missed
sys_llseek(). Fix it.
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 22 Mar 2014 16:13:59 +0000 (09:13 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Two oneliner 'perf bench' tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf bench: Fix NULL pointer dereference in "perf bench all"
perf bench numa: Make no args mean 'run all tests'
Linus Torvalds [Sat, 22 Mar 2014 16:13:13 +0000 (09:13 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Only two patches this time, one to fix ethernet probe order on at91
(better fix with proper device aliasing will be done for 3.15, this is
stop-gap), and one update to MAINTAINERS due to Freescale moving their
repo to kernel.org"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: at91: fix network interface ordering for sama5d36
MAINTAINERS: update IMX kernel git tree
Linus Torvalds [Fri, 21 Mar 2014 05:32:28 +0000 (22:32 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Some final few intel fixes, all regressions, all stable cc, and one
exynos oops fixer.
The biggest is probably the intel display error irqs one, but it seems
to fix a few crashes on startup, and one use after free in drm core"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/exynos: Fix (more) freeing issues in exynos_drm_drv.c
drm/i915: Disable stolen memory when DMAR is active
Revert "drm/i915: don't touch the VDD when disabling the panel"
drm: Fix use-after-free in the shadow-attache exit code
drm/i915: Don't enable display error interrupts from the start
drm/i915: Fix scanline counter fixup on BDW
drm/i915: Add a workaround for HSW scanline counter weirdness
drm/i915: Fix PSR programming
Dave Jones [Thu, 20 Mar 2014 21:03:58 +0000 (15:03 -0600)]
block: free q->flush_rq in blk_init_allocated_queue error paths
Commit
7982e90c3a57 ("block: fix q->flush_rq NULL pointer crash on
dm-mpath flush") moved an allocation to blk_init_allocated_queue(), but
neglected to free that allocation on the error paths that follow.
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 21 Mar 2014 05:11:17 +0000 (22:11 -0700)]
futex: revert back to the explicit waiter counting code
Srikar Dronamraju reports that commit
b0c29f79ecea ("futexes: Avoid
taking the hb->lock if there's nothing to wake up") causes java threads
getting stuck on futexes when runing specjbb on a power7 numa box.
The cause appears to be that the powerpc spinlocks aren't using the same
ticket lock model that we use on x86 (and other) architectures, which in
turn result in the "spin_is_locked()" test in hb_waiters_pending()
occasionally reporting an unlocked spinlock even when there are pending
waiters.
So this reinstates Davidlohr Bueso's original explicit waiter counting
code, which I had convinced Davidlohr to drop in favor of figuring out
the pending waiters by just using the existing state of the spinlock and
the wait queue.
Reported-and-tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Original-code-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 21 Mar 2014 05:09:30 +0000 (22:09 -0700)]
Merge tag 'trace-fixes-v3.14-rc7' of git://git./linux/kernel/git/rostedt/linux-trace
Pull trace fix from Steven Rostedt:
"Vaibhav Nagarnaik discovered that since 3.10 a clean-up patch made the
array index in the trace event format bogus.
He supplied an elegant solution that uses __stringify() and also
removes the need for the event_storage and event_storage_mutex and
also cuts off a few K of overhead from the trace events"
* tag 'trace-fixes-v3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix array size mismatch in format string
Hugh Dickins [Fri, 21 Mar 2014 04:52:17 +0000 (21:52 -0700)]
mm: fix swapops.h:131 bug if remap_file_pages raced migration
Add remove_linear_migration_ptes_from_nonlinear(), to fix an interesting
little include/linux/swapops.h:131 BUG_ON(!PageLocked) found by trinity:
indicating that remove_migration_ptes() failed to find one of the
migration entries that was temporarily inserted.
The problem comes from remap_file_pages()'s switch from vma_interval_tree
(good for inserting the migration entry) to i_mmap_nonlinear list (no good
for locating it again); but can only be a problem if the remap_file_pages()
range does not cover the whole of the vma (zap_pte() clears the range).
remove_migration_ptes() needs a file_nonlinear method to go down the
i_mmap_nonlinear list, applying linear location to look for migration
entries in those vmas too, just in case there was this race.
The file_nonlinear method does need rmap_walk_control.arg to do this;
but it never needed vma passed in - vma comes from its own iteration.
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>