git.openwrt.org Git - openwrt/staging/blogic.git/log

mutex: Back out architecture specific check for negative mutex count

Linus suggested that probably all the supported architectures can
allow a negative mutex count without incorrect behavior, so we can
then back out the architecture specific change and allow the
mutex count to go to any negative number. That should further
reduce contention for non-x86 architecture.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chandramouleeswaran Aswin <aswin@hp.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Norton Scott J <scott.norton@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366226594-5506-5-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mutex: Queue mutex spinners with MCS lock to reduce cacheline contention

The current mutex spinning code (with MUTEX_SPIN_ON_OWNER option
turned on) allow multiple tasks to spin on a single mutex
concurrently. A potential problem with the current approach is
that when the mutex becomes available, all the spinning tasks
will try to acquire the mutex more or less simultaneously. As a
result, there will be a lot of cacheline bouncing especially on
systems with a large number of CPUs.

This patch tries to reduce this kind of contention by putting
the mutex spinners into a queue so that only the first one in
the queue will try to acquire the mutex. This will reduce
contention and allow all the tasks to move forward faster.

The queuing of mutex spinners is done using an MCS lock based
implementation which will further reduce contention on the mutex
cacheline than a similar ticket spinlock based implementation.
This patch will add a new field into the mutex data structure
for holding the MCS lock. This expands the mutex size by 8 bytes
for 64-bit system and 4 bytes for 32-bit system. This overhead
will be avoid if the MUTEX_SPIN_ON_OWNER option is turned off.

The following table shows the jobs per minute (JPM) scalability
data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The
numactl command is used to restrict the running of the fserver
workloads to 1/2/4/8 nodes with hyperthreading off.

+-----------------+-----------+-----------+-------------+----------+
|  Configuration  | Mean JPM  | Mean JPM  |  Mean JPM   | % Change |
|                 | w/o patch | patch 1   | patches 1&2 |  1->1&2  |
+-----------------+------------------------------------------------+
|                 |              User Range 1100 - 2000            |
+-----------------+------------------------------------------------+
| 8 nodes, HT off |  227972   |  227237   |   305043    |  +34.2%  |
| 4 nodes, HT off |  393503   |  381558   |   394650    |   +3.4%  |
| 2 nodes, HT off |  334957   |  325240   |   338853    |   +4.2%  |
| 1 node , HT off |  198141   |  197972   |   198075    |   +0.1%  |
+-----------------+------------------------------------------------+
|                 |              User Range 200 - 1000             |
+-----------------+------------------------------------------------+
| 8 nodes, HT off |  282325   |  312870   |   332185    |   +6.2%  |
| 4 nodes, HT off |  390698   |  378279   |   393419    |   +4.0%  |
| 2 nodes, HT off |  336986   |  326543   |   340260    |   +4.2%  |
| 1 node , HT off |  197588   |  197622   |   197582    |    0.0%  |
+-----------------+-----------+-----------+-------------+----------+

At low user range 10-100, the JPM differences were within +/-1%.
So they are not that interesting.

The fserver workload uses mutex spinning extensively. With just
the mutex change in the first patch, there is no noticeable
change in performance.  Rather, there is a slight drop in
performance. This mutex spinning patch more than recovers the
lost performance and show a significant increase of +30% at high
user load with the full 8 nodes. Similar improvements were also
seen in a 3.8 kernel.

The table below shows the %time spent by different kernel
functions as reported by perf when running the fserver workload
at 1500 users with all 8 nodes.

+-----------------------+-----------+---------+-------------+
|        Function       |  % time   | % time  |   % time    |
|                       | w/o patch | patch 1 | patches 1&2 |
+-----------------------+-----------+---------+-------------+
| __read_lock_failed    |  34.96%   | 34.91%  |   29.14%    |
| __write_lock_failed   |  10.14%   | 10.68%  |    7.51%    |
| mutex_spin_on_owner   |   3.62%   |  3.42%  |    2.33%    |
| mspin_lock            |    N/A    |   N/A   |    9.90%    |
| __mutex_lock_slowpath |   1.46%   |  0.81%  |    0.14%    |
| _raw_spin_lock        |   2.25%   |  2.50%  |    1.10%    |
+-----------------------+-----------+---------+-------------+

The fserver workload for an 8-node system is dominated by the
contention in the read/write lock. Mutex contention also plays a
role. With the first patch only, mutex contention is down (as
shown by the __mutex_lock_slowpath figure) which help a little
bit. We saw only a few percents improvement with that.

By applying patch 2 as well, the single mutex_spin_on_owner
figure is now split out into an additional mspin_lock figure.
The time increases from 3.42% to 11.23%. It shows a great
reduction in contention among the spinners leading to a 30%
improvement. The time ratio 9.9/2.33=4.3 indicates that there
are on average 4+ spinners waiting in the spin_lock loop for
each spinner in the mutex_spin_on_owner loop. Contention in
other locking functions also go down by quite a lot.

The table below shows the performance change of both patches 1 &
2 over patch 1 alone in other AIM7 workloads (at 8 nodes,
hyperthreading off).

+--------------+---------------+----------------+-----------------+
|   Workload   | mean % change | mean % change  | mean % change   |
|              | 10-100 users  | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| alltests     |      0.0%     |     -0.8%      |     +0.6%       |
| five_sec     |     -0.3%     |     +0.8%      |     +0.8%       |
| high_systime |     +0.4%     |     +2.4%      |     +2.1%       |
| new_fserver  |     +0.1%     |    +14.1%      |    +34.2%       |
| shared       |     -0.5%     |     -0.3%      |     -0.4%       |
| short        |     -1.7%     |     -9.8%      |     -8.3%       |
+--------------+---------------+----------------+-----------------+

The short workload is the only one that shows a decline in
performance probably due to the spinner locking and queuing
overhead.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Reviewed-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chandramouleeswaran Aswin <aswin@hp.com>
Cc: Norton Scott J <scott.norton@hp.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366226594-5506-4-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mutex: Make more scalable by doing less atomic operations

In the __mutex_lock_common() function, an initial entry into
the lock slow path will cause two atomic_xchg instructions to be
issued. Together with the atomic decrement in the fast path, a
total of three atomic read-modify-write instructions will be
issued in rapid succession. This can cause a lot of cache
bouncing when many tasks are trying to acquire the mutex at the
same time.

This patch will reduce the number of atomic_xchg instructions
used by checking the counter value first before issuing the
instruction. The atomic_read() function is just a simple memory
read. The atomic_xchg() function, on the other hand, can be up
to 2 order of magnitude or even more in cost when compared with
atomic_read(). By using atomic_read() to check the value first
before calling atomic_xchg(), we can avoid a lot of unnecessary
cache coherency traffic. The only downside with this change is
that a task on the slow path will have a tiny bit less chance of
getting the mutex when competing with another task in the fast
path.

The same is true for the atomic_cmpxchg() function in the
mutex-spin-on-owner loop. So an atomic_read() is also performed
before calling atomic_cmpxchg().

The mutex locking and unlocking code for the x86 architecture
can allow any negative number to be used in the mutex count to
indicate that some tasks are waiting for the mutex. I am not so
sure if that is the case for the other architectures. So the
default is to avoid atomic_xchg() if the count has already been
set to -1. For x86, the check is modified to include all
negative numbers to cover a larger case.

The following table shows the jobs per minutes (JPM) scalability
data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The
numactl command is used to restrict the running of the
high_systime workloads to 1/2/4/8 nodes with hyperthreading on
and off.

+-----------------+-----------+------------+----------+
|  Configuration  | Mean JPM  |  Mean JPM  | % Change |
|   | w/o patch | with patch |       |
+-----------------+-----------------------------------+
|   |      User Range 1100 - 2000       |
+-----------------+-----------------------------------+
| 8 nodes, HT on  |    36980   |   148590  | +301.8%  |
| 8 nodes, HT off |    42799   |   145011  | +238.8%  |
| 4 nodes, HT on  |    61318   |   118445  |  +51.1%  |
| 4 nodes, HT off |   158481   |   158592  |   +0.1%  |
| 2 nodes, HT on  |   180602   |   173967  |   -3.7%  |
| 2 nodes, HT off |   198409   |   198073  |   -0.2%  |
| 1 node , HT on  |   149042   |   147671  |   -0.9%  |
| 1 node , HT off |   126036   |   126533  |   +0.4%  |
+-----------------+-----------------------------------+
|   |       User Range 200 - 1000       |
+-----------------+-----------------------------------+
| 8 nodes, HT on  |   41525    |   122349  | +194.6%  |
| 8 nodes, HT off |   49866    |   124032  | +148.7%  |
| 4 nodes, HT on  |   66409    |   106984  |  +61.1%  |
| 4 nodes, HT off |  119880    |   130508  |   +8.9%  |
| 2 nodes, HT on  |  138003    |   133948  |   -2.9%  |
| 2 nodes, HT off |  132792    |   131997  |   -0.6%  |
| 1 node , HT on  |  116593    |   115859  |   -0.6%  |
| 1 node , HT off |  104499    |   104597  |   +0.1%  |
+-----------------+------------+-----------+----------+

At low user range 10-100, the JPM differences were within +/-1%.
So they are not that interesting.

AIM7 benchmark run has a pretty large run-to-run variance due to
random nature of the subtests executed. So a difference of less
than +-5% may not be really significant.

This patch improves high_systime workload performance at 4 nodes
and up by maintaining transaction rates without significant
drop-off at high node count.  The patch has practically no
impact on 1 and 2 nodes system.

The table below shows the percentage time (as reported by perf
record -a -s -g) spent on the __mutex_lock_slowpath() function
by the high_systime workload at 1500 users for 2/4/8-node
configurations with hyperthreading off.

+---------------+-----------------+------------------+---------+
| Configuration | %Time w/o patch | %Time with patch | %Change |
+---------------+-----------------+------------------+---------+
|    8 nodes    |      65.34%     |      0.69%       |  -99%   |
|    4 nodes    |       8.70%   |      1.02%      |  -88%   |
|    2 nodes    |       0.41%     |      0.32%       |  -22%   |
+---------------+-----------------+------------------+---------+

It is obvious that the dramatic performance improvement at 8
nodes was due to the drastic cut in the time spent within the
__mutex_lock_slowpath() function.

The table below show the improvements in other AIM7 workloads
(at 8 nodes, hyperthreading off).

+--------------+---------------+----------------+-----------------+
|   Workload   | mean % change | mean % change  | mean % change   |
|              | 10-100 users  | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| alltests     |     +0.6%     |   +104.2%      |   +185.9%       |
| five_sec     |     +1.9%     |     +0.9%      |     +0.9%       |
| fserver      |     +1.4%     |     -7.7%      |     +5.1%       |
| new_fserver  |     -0.5%     |     +3.2%      |     +3.1%       |
| shared       |    +13.1%     |   +146.1%      |   +181.5%       |
| short        |     +7.4%     |     +5.0%      |     +4.2%       |
+--------------+---------------+----------------+-----------------+

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Reviewed-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chandramouleeswaran Aswin <aswin@hp.com>
Cc: Norton: Scott J <scott.norton@hp.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366226594-5506-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mutex: Move mutex spinning code from sched/core.c back to mutex.c

As mentioned by Ingo, the SCHED_FEAT_OWNER_SPIN scheduler
feature bit was really just an early hack to make with/without
mutex-spinning testable. So it is no longer necessary.

This patch removes the SCHED_FEAT_OWNER_SPIN feature bit and
move the mutex spinning code from kernel/sched/core.c back to
kernel/mutex.c which is where they should belong.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chandramouleeswaran Aswin <aswin@hp.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Norton Scott J <scott.norton@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366226594-5506-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

locking/rtmutex/tester: Set correct permissions on sysfs files

sysfs started complaining about cases where permissions don't
match what's in the sysfs ops structure (such as allowing read
without a "show" callback).

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: williams@redhat.com
Link: http://lkml.kernel.org/r/1363795105-5884-1-git-send-email-sasha.levin@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

lockdep: Remove unnecessary 'hlock_next' variable

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1365058881-4044-1-git-send-email-honkiko@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Linux 3.9-rc6

Merge git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fix from Gleb Natapov:
"Bugfix for the regression introduced by commit c300aa64ddf5"

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Allow cross page reads and writes from cached translations.

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"Two quite small fixes: one a build problem, and the other fixes
  seccomp filters on x32."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix rebuild with EFI_STUB enabled
  x86: remove the x32 syscall bitmask from syscall_get_nr()

alpha: irq: remove deprecated use of IRQF_DISABLED

Interrupt handlers are always invoked with interrupts disabled, so
remove all uses of the deprecated IRQF_DISABLED flag.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

alpha: irq: run all handlers with interrupts disabled

Linux has expected that interrupt handlers are executed with local
interrupts disabled for a while now, so ensure that this is the case on
Alpha even for non-device interrupts such as IPIs.

Without this patch, secondary boot results in the following backtrace:

  warning: at kernel/softirq.c:139 __local_bh_enable+0xb8/0xd0()
  trace:
    __local_bh_enable+0xb8/0xd0
    irq_enter+0x74/0xa0
    scheduler_ipi+0x50/0x100
    handle_ipi+0x84/0x260
    do_entint+0x1ac/0x2e0
    irq_exit+0x60/0xa0
    handle_irq+0x98/0x100
    do_entint+0x2c8/0x2e0
    ret_from_sys_call+0x0/0x10
    load_balance+0x3e4/0x870
    cpu_idle+0x24/0x80
    rcu_eqs_enter_common.isra.38+0x0/0x120
    cpu_idle+0x40/0x80
    rest_init+0xc0/0xe0
    _stext+0x1c/0x20

A similar dump occurs if you try to reboot using magic-sysrq.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

alpha: makefile: don't enforce small data model for kernel builds

Due to all of the goodness being packed into today's kernels, the
resulting image isn't as slim as it once was.

In light of this, don't pass -msmall-data to gcc, which otherwise results
in link failures due to impossible relocations when compiling anything but
the most trivial configurations.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Thorsten Kranzkowski <dl8bcu@dl8bcu.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

alpha: Add irongate_io to PCI bus resources

Fixes a NULL pointer dereference at boot on UP1500.

Cc: stable@vger.kernel.org
Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jay Estabrook <jay.estabrook@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

KVM: Allow cross page reads and writes from cached translations.

This patch adds support for kvm_gfn_to_hva_cache_init functions for
reads and writes that will cross a page. If the range falls within
the same memslot, then this will be a fast operation. If the range
is split between two memslots, then the slower kvm_read_guest and
kvm_write_guest are used.

Tested: Test against kvm_clock unit tests.

Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

Merge tag 'dm-3.9-fixes-2' of git://git./linux/kernel/git/agk/linux-dm

Pull device-mapper fixes from Alasdair Kergon:
"A pair of patches to fix the writethrough mode of the device-mapper
  cache target when the device being cached is not itself wrapped with
  device-mapper."

* tag 'dm-3.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm cache: reduce bio front_pad size in writeback mode
  dm cache: fix writes to cache device in writethrough mode

Merge tag 'pci-v3.9-fixes-1' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
"PCI updates for v3.9:

  ASPM
      Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
  kexec
      PCI: Don't try to disable Bus Master on disconnected PCI devices
  Platform ROM images
      PCI: Add PCI ROM helper for platform-provided ROM images
      nouveau: Attempt to use platform-provided ROM image
      radeon: Attempt to use platform-provided ROM image
  Hotplug
      PCI/ACPI: Always resume devices on ACPI wakeup notifications
      PCI/PM: Disable runtime PM of PCIe ports
  EISA
      EISA/PCI: Fix bus res reference
      EISA/PCI: Init EISA early, before PNP"

* tag 'pci-v3.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/PM: Disable runtime PM of PCIe ports
  PCI/ACPI: Always resume devices on ACPI wakeup notifications
  PCI: Don't try to disable Bus Master on disconnected PCI devices
  Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
  radeon: Attempt to use platform-provided ROM image
  nouveau: Attempt to use platform-provided ROM image
  EISA/PCI: Init EISA early, before PNP
  EISA/PCI: Fix bus res reference
  PCI: Add PCI ROM helper for platform-provided ROM images

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix erroneous sock_orphan() leading to crashes and double
    kfree_skb() in NFC protocol.  From Thierry Escande and Samuel Ortiz.

2) Fix use after free in remain-on-channel mac80211 code, from Johannes
    Berg.

3) nf_reset() needs to reset the NF tracing cookie, otherwise we can
    leak it from one namespace into another.  Fix from Gao Feng and
    Patrick McHardy.

4) Fix overflow in channel scanning array of mwifiex driver, from Stone
    Piao.

5) Fix loss of link after suspend/shutdown in r8169, from Hayes Wang.

6) Synchronization of unicast address lists to the undelying device
    doesn't work because whether to sync is maintained as a boolean
    rather than a true count.  Fix from Vlad Yasevich.

7) Fix corruption of TSO packets in atl1e by limiting the segmented
    packet length.  From Hannes Frederic Sowa.

8) Revert bogus AF_UNIX credential passing change and fix the
    coalescing issue properly, from Eric W Biederman.

9) Changes of ipv4 address lifetime settings needs to generate a
    notification, from Jiri Pirko.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
  netfilter: don't reset nf_trace in nf_reset()
  net: ipv4: notify when address lifetime changes
  ixgbe: fix registration order of driver and DCA nofitication
  af_unix: If we don't care about credentials coallesce all messages
  Revert "af_unix: dont send SCM_CREDENTIAL when dest socket is NULL"
  bonding: remove sysfs before removing devices
  atl1e: limit gso segment size to prevent generation of wrong ip length fields
  net: count hw_addr syncs so that unsync works properly.
  r8169: fix auto speed down issue
  netfilter: ip6t_NPT: Fix translation for non-multiple of 32 prefix lengths
  mwifiex: limit channel number not to overflow memory
  NFC: microread: Fix build failure due to a new MEI bus API
  iwlwifi: dvm: fix the passive-no-RX workaround
  netfilter: nf_conntrack: fix error return code
  NFC: llcp: Keep the connected socket parent pointer alive
  mac80211: fix idle handling sequence
  netfilter: nfnetlink_acct: return -EINVAL if object name is empty
  netfilter: nfnetlink_queue: fix error return code in nfnetlink_queue_init()
  netfilter: reset nf_trace in nf_reset
  mac80211: fix remain-on-channel cancel crash
  ...

x86: Fix rebuild with EFI_STUB enabled

eboot.o and efi_stub_$(BITS).o didn't get added to "targets", and hence
their .cmd files don't get included by the build machinery, leading to
the files always getting rebuilt.

Rather than adding the two files individually, take the opportunity and
add $(VMLINUX_OBJS) to "targets" instead, thus allowing the assignment
at the top of the file to be shrunk quite a bit.

At the same time, remove a pointless flags override line - the variable
assigned to was misspelled anyway, and the options added are
meaningless for assembly sources.

[ hpa: the patch is not minimal, but I am taking it for -urgent anyway
since the excess impact of the patch seems to be small enough. ]

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/515C5D2502000078000CA6AD@nat28.tlf.novell.com
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

netfilter: don't reset nf_trace in nf_reset()

Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.

nf_reset() is used in the following cases:

- when passing packets up the the socket layer, at which point we want to
  release all netfilter references that might keep modules pinned while
  the packet is queued. nf_trace doesn't matter anymore at this point.

- when encapsulating or decapsulating IPsec packets. We want to continue
  tracing these packets after IPsec processing.

- when passing packets through virtual network devices. Only devices on
  that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
  used anymore. Its not entirely clear whether those packets should
  be traced after that, however we've always done that.

- when passing packets through virtual network devices that make the
  packet cross network namespace boundaries. This is the only cases
  where we clearly want to reset nf_trace and is also what the
  original patch intended to fix.

Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
"Fixes for a number of small glitches in various corners of the MIPS
  tree.  No particular areas is standing out.

  With this applied all MIPS defconfigs are building fine.  No merge
  conflicts are expected."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Delete definition of SA_RESTORER.
  MIPS: Fix ISA level which causes secondary cache init bypassing and more
  MIPS: Fix build error cavium-octeon without CONFIG_SMP
  MIPS: Kconfig: Rename SNIPROM too
  MIPS: Alchemy: Fix typo "CONFIG_DEBUG_PCI"
  MIPS: Unbreak function tracer for 64-bit kernel.

Merge git://git./linux/kernel/git/steve/gfs2-3.0-fixes

Pull GFS2 fixes from Steven Whitehouse:
"There are two patches which fix up a couple of minor issues in the DLM
  interface code, a missing error path in gfs2_rs_alloc(), one patch
  which fixes a problem during "withdraw" and a fix for discards/FITRIM
  when using 4k sector sized devices."

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
  GFS2: Issue discards in 512b sectors
  GFS2: Fix unlock of fcntl locks during withdrawn state
  GFS2: return error if malloc failed in gfs2_rs_alloc()
  GFS2: use memchr_inv
  GFS2: use kmalloc for lvb bitmap

firmware,IB/qib: revert firmware file move

Commit e2eed58b4fbf ("IB/qib: change QLogic to Intel") moved a firmware
file potentially breaking the ABI.

This patch reverts that aspect of the fix as well as reverting the
firmware name as used in qib.

Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'spi-fix-v3.9-rc5' of git://git./linux/kernel/git/broonie/misc

Pull spi fixes from Mark Brown:
"A bunch of small driver fixes plus a fix for error handling in the
  core - nothing too exciting overall."

* tag 'spi-fix-v3.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc:
  spi/mpc512x-psc: optionally keep PSC SS asserted across xfer segmensts
  spi: Unlock a spinlock before calling into the controller driver.
  spi/s3c64xx: modified error interrupt handling and init
  spi/bcm63xx: don't disable non enabled clocks in probe error path
  spi/bcm63xx: Remove unused variable
  spi: slink-tegra20: move runtime pm calls to transfer_one_message

GFS2: Issue discards in 512b sectors

This patch changes GFS2's discard issuing code so that it calls
function sb_issue_discard rather than blkdev_issue_discard. The
code was calling blkdev_issue_discard and specifying the correct
sector offset and sector size, but blkdev_issue_discard expects
these values to be in terms of 512 byte sectors, even if the native
sector size for the device is different. Calling sb_issue_discard
with the BLOCK size instead ensures the correct block-to-512b-sector
translation. I verified that "minlen" is specified in blocks, so
comparing it to a number of blocks is correct.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

Revert "drivers/rtc/rtc-at91rm9200.c: use a variable for storing IMR"

This reverts commit 0ef1594c017521ea89278e80fe3f80dafb17abde.

This patch introduced a few races which cannot be easily fixed with a
small follow-up patch. Furthermore, the SoC with the broken hardware
register, which this patch intended to add support for, can only be used
with device trees, which this driver currently does not support.

[ Here is the discussion that led to this "revert" patch:
https://lkml.org/lkml/2013/4/3/176 ]

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'fbdev-fixes-3.9-rc6' of git://gitorious.org/linux-omap-dss2/linux

Pull fbdev fixes from Tomi Valkeinen:
"Fix uvesafb crash bug and typoed flag name in fbmon's new videomode
  code"

* tag 'fbdev-fixes-3.9-rc6' of git://gitorious.org/linux-omap-dss2/linux:
  video:uvesafb: Fix dereference NULL pointer code path
  fbmon: use VESA_DMT_VSYNC_HIGH to fix typo

Merge tag 'sound-3.9' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This contains slightly more volumes than usual at this stage, mostly
  because of my vacation in the last week.  Nothing to scare, all small
  and/or trivial fixes:

   - Fix loop path handling in ASoC DAPM
   - Some memory handling fixes in ASoC core
   - Fix spear_pcm to adapt to the updated API
   - HD-audio HDMI ELD handling fixes
   - Fix for CM6331 USB-audio SRC change bugs
   - Revert power_save_controller option change due to user-space usage
   - A few other small ASoC and HD-audio fixes"

* tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/generic - fix uninitialized variable
  Revert "ALSA: hda - Allow power_save_controller option override DCAPS"
  ALSA: hda - fix typo in proc output
  ALSA: hda - Enabling Realtek ALC 671 codec
  ALSA: usb: Work around CM6631 sample rate change bug
  ALSA: hda - bug fix on HDMI ELD debug message
  ALSA: hda - bug fix on return value when getting HDMI ELD info
  ASoC: dma-sh7760: Fix compile error
  ASoC: core: fix invalid free of devm_ allocated data
  ASoC: spear_pcm: Update to new pcm_new() API
  ASoC:: max98090: Remove executable bit
  ASoC: dapm: Fix pointer dereference in is_connected_output_ep()
  ASoC: pcm030 audio fabric: remove __init from probe
  ASoC: imx-ssi: Fix occasional AC97 reset failure
  ASoC: core: fix possible memory leak in snd_soc_bytes_put()
  ASoC: wm_adsp: fix possible memory leak in wm_adsp_load_coeff()
  ASoC: dapm: Fix handling of loops
  ASoC: si476x: Add missing break for SNDRV_PCM_FORMAT_S8 switch case

dm cache: reduce bio front_pad size in writeback mode

A recent patch to fix the dm cache target's writethrough mode extended
the bio's front_pad to include a 1056-byte struct dm_bio_details.
Writeback mode doesn't need this, so this patch reduces the
per_bio_data_size to 16 bytes in this case instead of 1096.

The dm_bio_details structure was added in "dm cache: fix writes to
cache device in writethrough mode" which fixed commit e2e74d617e ("dm
cache: fix race in writethrough implementation"). In writeback mode
we avoid allocating the writethrough-specific members of the
per_bio_data structure (the dm_bio_details structure included).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm cache: fix writes to cache device in writethrough mode

The dm-cache writethrough strategy introduced by commit e2e74d617eadc15
("dm cache: fix race in writethrough implementation") issues a bio to
the origin device, remaps and then issues the bio to the cache device.
This more conservative in-series approach was selected to favor
correctness over performance (of the previous parallel writethrough).
However, this in-series implementation that reuses the same bio to write
both the origin and cache device didn't take into account that the block
layer's req_bio_endio() modifies a completing bio's bi_sector and
bi_size. So the new writethrough strategy needs to preserve these bio
fields, and restore them before submission to the cache device,
otherwise nothing gets written to the cache (because bi_size is 0).

This patch adds a struct dm_bio_details field to struct per_bio_data,
and uses dm_bio_record() and dm_bio_restore() to ensure the bio is
restored before reissuing to the cache device. Adding such a large
structure to the per_bio_data is not ideal but we can improve this
later, for now correctness is the important thing.

This problem initially went unnoticed because the dm-cache test-suite
uses a linear DM device for the dm-cache device's origin device.
Writethrough worked as expected because DM submits a *clone* of the
original bio, so the original bio which was reused for the cache was
never touched.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

MIPS: Delete definition of SA_RESTORER.

SA_RESTORER used to be defined as 0x04000000 but only the O32 ABI ever
supported its use and no libc was using it, so the entire sa-restorer
functionality was removed with lmo commit 39bffc12c3580ab [Zap sa_restorer.]
for 2.5.48 retaining only the SA_RESTORER definition as a reminder to avoid
accidental reuse of the mask bit.

Upstream cdef9602fbf1871a43f0f1b5cea10dd0f275167d [signal: always clear
sa_restorer on execve] adds code that assumes sa_sigaction has an
sa_restorer field, if SA_RESTORER is defined which would break MIPS.
So remove the SA_RESTORER definition before the v3.8.4 merge.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
(cherry picked from commit 17da8d63add23830892ac4dc2cbb3b5d4ffb79a8)

MIPS: Fix ISA level which causes secondary cache init bypassing and more

The commit a96102be70 introduced set_isa() where compatible ISA info is
also set aside from the one gets passed in. It means, for example, 1004K
will have MIPS_CPU_ISA_M32R2/M32R1/II/I flags. This leads to things like
the following inappropriate:

if (c->isa_level == MIPS_CPU_ISA_M32R1 ||
    c->isa_level == MIPS_CPU_ISA_M32R2 ||
    c->isa_level == MIPS_CPU_ISA_M64R1 ||
    c->isa_level == MIPS_CPU_ISA_M64R2)

This patch fixes it.

Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Fix build error cavium-octeon without CONFIG_SMP

Singed-off-by: EunBong Song <eunb.song@samsung.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Kconfig: Rename SNIPROM too

CONFIG_SNIPROM was renamed to CONFIG_FW_SNIPROM in v3.8. Let's rename
SNIPROM itself too.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-mips@linux-mips.org;
Cc: linux-kernel@vger.kernel.org
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Alchemy: Fix typo "CONFIG_DEBUG_PCI"

Commit 7517de348663b08a808aff44b5300e817157a568 ("MIPS: Alchemy: Redo
PCI as platform driver") added a reference to CONFIG_DEBUG_PCI. Change
it to CONFIG_PCI_DEBUG, as that is a valid Kconfig macro.

Also add a newline to a debugging printk that this fix enables.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Unbreak function tracer for 64-bit kernel.

Commit 58b69401c797 [MIPS: Function tracer: Fix broken function tracing]
completely broke the function tracer for 64-bit kernels. The symptom is
a system hang very early in the boot process.

The fix: Remove/fix $sp adjustments for 64-bit case.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: Al Cooper <alcooperx@gmail.com>
Cc: viric@viric.name
Cc: stable@vger.kernel.org # 3.8.x
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

ALSA: hda/generic - fix uninitialized variable

changed is not initialized in path_power_down_sync, but it is expected
to be false in case no change happened in the loop. So set it to
false.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

net: ipv4: notify when address lifetime changes

if userspace changes lifetime of address, send netlink notification and
call notifier.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: fix registration order of driver and DCA nofitication

ixgbe_notify_dca cannot be called before driver registration
because it expects driver's klist_devices to be allocated and
initialized. While on it make sure debugfs files are removed
when registration fails.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_unix: If we don't care about credentials coallesce all messages

It was reported that the following LSB test case failed
https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we
were not coallescing unix stream messages when the application was
expecting us to.

The problem was that the first send was before the socket was accepted
and thus sock->sk_socket was NULL in maybe_add_creds, and the second
send after the socket was accepted had a non-NULL value for sk->socket
and thus we could tell the credentials were not needed so we did not
bother.

The unnecessary credentials on the first message cause
unix_stream_recvmsg to start verifying that all messages had the same
credentials before coallescing and then the coallescing failed because
the second message had no credentials.

Ignoring credentials when we don't care in unix_stream_recvmsg fixes a
long standing pessimization which would fail to coallesce messages when
reading from a unix stream socket if the senders were different even if
we did not care about their credentials.

I have tested this and verified that the in the LSB test case mentioned
above that the messages do coallesce now, while the were failing to
coallesce without this change.

Reported-by: Karel Srot <ksrot@redhat.com>
Reported-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "af_unix: dont send SCM_CREDENTIAL when dest socket is NULL"

This reverts commit 14134f6584212d585b310ce95428014b653dfaf6.

The problem that the above patch was meant to address is that af_unix
messages are not being coallesced because we are sending unnecesarry
credentials. Not sending credentials in maybe_add_creds totally
breaks unconnected unix domain sockets that wish to send credentails
to other sockets.

In practice this break some versions of udev because they receive a
message and the sending uid is bogus so they drop the message.

Reported-by: Sven Joachim <svenjoac@gmx.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: remove sysfs before removing devices

We have a race condition if we try to rmmod bonding and simultaneously add
a bond master through sysfs. In bonding_exit() we first remove the devices
(through rtnl_link_unregister() ) and only after that we remove the sysfs.
If we manage to add a device through sysfs after that the devices were
removed - we'll end up with that device/sysfs structure and with the module
unloaded.

Fix this by first removing the sysfs and only after that calling
rtnl_link_unregister().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1e: limit gso segment size to prevent generation of wrong ip length fields

The limit of 0x3c00 is taken from the windows driver.

Suggested-by: Huang, Xiong <xiong@qca.qualcomm.com>
Cc: Huang, Xiong <xiong@qca.qualcomm.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: count hw_addr syncs so that unsync works properly.

A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices. In
some cases (bond/team) a single address list is synched down
to multiple devices. At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.

Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'pm+acpi-3.9-rc6' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:

- Revert of a recent cpuidle change that caused Nehalem machines to
   hang on boot from Alex Shi.

- USB power management fix addressing a crash in the port device
   object's release routine from Rafael J Wysocki.

- Device PM QoS fix for a potential deadlock related to sysfs interface
   from Rafael J Wysocki.

- Fix for a cpufreq crash when the /cpus Device Tree node is missing
   from Paolo Pisati.

- Fix for a build issue on ia64 related to the Boot Graphics Resource
   Table (BGRT) from Tony Luck.

- Two fixes for ACPI handles being set incorrectly for device objects
   that don't correspond to any ACPI namespace nodes in the I2C and SPI
   subsystems from Rafael J Wysocki.

- Fix for compiler warnings related to CONFIG_PM_DEVFREQ being unset
   from Rajagopal Venkat.

- Fix for a symbol definition typo in cpufreq_governor.h from Borislav
   Petkov.

* tag 'pm+acpi-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / BGRT: Don't let users configure BGRT on non X86 systems
  cpuidle / ACPI: recover percpu ACPI processor cstate
  ACPI / I2C: Use parent's ACPI_HANDLE() in acpi_i2c_register_devices()
  cpufreq: Correct header guards typo
  ACPI / SPI: Use parent's ACPI_HANDLE() in acpi_register_spi_devices()
  cpufreq: check OF node /cpus presence before dereferencing it
  PM / devfreq: Fix compiler warnings for CONFIG_PM_DEVFREQ unset
  PM / QoS: Avoid possible deadlock related to sysfs access
  USB / PM: Don't try to hide PM QoS flags from usb_port_device_release()

r8169: fix auto speed down issue

It would cause no link after suspending or shutdowning when the
nic changes the speed to 10M and connects to a link partner which
forces the speed to 100M.

Check the link partner ability to determine which speed to set.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://1984.lsi.us.es/nf

Pablo Neira Ayuso says:

====================
The following patchset contains netfilter updates for your net tree,
they are:

* Fix missing the skb->trace reset in nf_reset, noticed by Gao Feng
  while using the TRACE target with several net namespaces.

* Fix prefix translation in IPv6 NPT if non-multiple of 32 prefixes
  are used, from Matthias Schiffer.

* Fix invalid nfacct objects with empty name, they are now rejected
  with -EINVAL, spotted by Michael Zintakis, patch from myself.

* A couple of fixes for wrong return values in the error path of
  nfnetlink_queue and nf_conntrack, from Wei Yongjun.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless into wireless

John W. Linville says:

====================
Here are some more fixes intended for the 3.9 stream...

Regarding the mac80211 bits, Johannes says:

"I had changed the idle handling to simplify it, but broken the
sequencing of commands, at least for ath9k-htc, one patch restores the
sequence. The other patch fixes a crash Jouni found while stress-testing
the remain-on-channel code, when an item is deleted the work struct can
run twice and crash the second time."

As for the iwlwifi bits, Johannes says:

"The only fix here is to the passive-no-RX firmware regulatory
enforcement driver support code to not drop auth frames in quick
succession, leading to not being able to connect to APs on passive
channels in certain circumstances."

Don't forget the NFC bits, about which Samuel says:

"This time we have:

- A crash fix for when a DGRAM LLCP socket is listening while the NFC adapter
  is physically removed.
- A potential double skb free when the LLCP socket receive queue is full.
- A fix for properly handling multiple and consecutive LLCP connections, and
  not trash the socket ack log.
- A build failure for the MEI microread physical layer, now that the MEI bus
  APIs have been merged into char-misc-next."

On top of that, Stone Piao provides an mwifiex fix to avoid accessing
beyond the end of a buffer.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mm: prevent mmap_cache race in find_vma()

find_vma() can be called by multiple threads with read lock
held on mm->mmap_sem and any of them can update mm->mmap_cache.
Prevent compiler from re-fetching mm->mmap_cache, because other
readers could update it in the meantime:

               thread 1                             thread 2
                                        |
  find_vma()                            |  find_vma()
    struct vm_area_struct *vma = NULL;  |
    vma = mm->mmap_cache;               |
    if (!(vma && vma->vm_end > addr     |
        && vma->vm_start <= addr)) {    |
                                        |    mm->mmap_cache = vma;
    return vma;                         |
     ^^ compiler may optimize this      |
        local variable out and re-read  |
        mm->mmap_cache                  |

This issue can be reproduced with gcc-4.8.0-1 on s390x by running
mallocstress testcase from LTP, which triggers:

  kernel BUG at mm/rmap.c:1088!
    Call Trace:
     ([<000003d100c57000>] 0x3d100c57000)
      [<000000000023a1c0>] do_wp_page+0x2fc/0xa88
      [<000000000023baae>] handle_pte_fault+0x41a/0xac8
      [<000000000023d832>] handle_mm_fault+0x17a/0x268
      [<000000000060507a>] do_protection_exception+0x1e2/0x394
      [<0000000000603a04>] pgm_check_handler+0x138/0x13c
      [<000003fffcf1f07a>] 0x3fffcf1f07a
    Last Breaking-Event-Address:
      [<000000000024755e>] page_add_new_anon_rmap+0xc2/0x168

Thanks to Jakub Jelinek for his insight on gcc and helping to
track this down.

Signed-off-by: Jan Stancek <jstancek@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'upstream-3.9-rc6' of git://git.infradead.org/linux-ubifs

Pull UBIFS fix from Artem Bityutskiy:
"Make the space fixup feature work in the case when the file-system is
first mounted R/O and then remounted R/W."

* tag 'upstream-3.9-rc6' of git://git.infradead.org/linux-ubifs:
UBIFS: make space fixup work in the remount case

Merge branch 'pm-fixes' into fixes

* pm-fixes:
  cpufreq: Correct header guards typo
  cpufreq: check OF node /cpus presence before dereferencing it
  PM / devfreq: Fix compiler warnings for CONFIG_PM_DEVFREQ unset
  PM / QoS: Avoid possible deadlock related to sysfs access
  USB / PM: Don't try to hide PM QoS flags from usb_port_device_release()

Merge branch 'acpi-fixes' into fixes

* acpi-fixes:
  ACPI / BGRT: Don't let users configure BGRT on non X86 systems
  cpuidle / ACPI: recover percpu ACPI processor cstate
  ACPI / I2C: Use parent's ACPI_HANDLE() in acpi_i2c_register_devices()
  ACPI / SPI: Use parent's ACPI_HANDLE() in acpi_register_spi_devices()

Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid

Pull HID fixes from Jiri Kosina:

- Workaround for device ID conflict between Masterkit MA901 usb radio
   device and Atmel V-USB devices, to avoid regressions from older
   kernels, by Alexey Klimov

- fix for possible race during input device registration in magicmouse
   driver, by Benjamin Tissoires

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: magicmouse: fix race between input_register() and probe()
  media: radio-ma901: return ENODEV in probe if usb_device doesn't match
  HID: fix Masterkit MA901 hid quirks

Merge tag 'gpio-fixes-v3.9' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"Two GPIO fixes for the v3.9 series:
   - Fix erroneous return value in the ICH driver
   - Make the STMPE driver proper properly on device tree boots"

* tag 'gpio-fixes-v3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: stmpe: pass DT node to irqdomain
  gpio-ich: Fix value returned by ichx_gpio_request

Revert "ALSA: hda - Allow power_save_controller option override DCAPS"

This reverts commit 6ab317419c62850a71e2adfd1573e5ee87d8774f.

The commit [6ab317419c: ALSA: hda - Allow power_save_controller option
override DCAPS] changed the behavior of power_save_controller so that
it can override the driver capability.  This assumed that this option
is rarely changed dynamically unlike power_save option.  Too naive.

It turned out that the user-space power-management tool tries to set
power_save_controller option to 1 together with power_save option
without knowing what's actually doing.  This enabled forcibly the
runtime PM of the controller,  which is known to be broken om many
chips thus disabled as default.

So, the only sane fix is to revert this commit again.  It was intended
to ease debugging/testing for runtime PM enablement, but obviously we
need another way for it.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=56171
Reported-and-tested-by: Nikita Tsukanov <keks9n@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - fix typo in proc output

Rename "Digitial In" to "Digital In". This function is only used for
proc output, so should not cause any problems to change.

Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

GFS2: Fix unlock of fcntl locks during withdrawn state

When withdraw occurs, we need to continue to allow unlocks of fcntl
locks to occur, however these will only be local, since the node has
withdrawn from the cluster. This prevents triggering a VFS level
bug trap due to locks remaining when a file is closed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

GFS2: return error if malloc failed in gfs2_rs_alloc()

The error code in gfs2_rs_alloc() is set to ENOMEM when error
but never be used, instead, gfs2_rs_alloc() always return 0.
Fix to return 'error'.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

GFS2: use memchr_inv

Use memchr_inv to verify that the specified memory range is cleared.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>

GFS2: use kmalloc for lvb bitmap

The temp lvb bitmap was on the stack, which could
be an alignment problem for __set_bit_le. Use
kmalloc for it instead.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

HID: magicmouse: fix race between input_register() and probe()

Since kernel 3.7, it appears that the input registration occured before
the end of magicmouse_setup_input(). This is shown by receiving a lot of
"EV_SYN SYN_REPORT 1" instead of normal "EV_SYN SYN_REPORT 0".
This value means that the output buffer is full, and the user space
is loosing events.

Using .input_configured guarantees that the race is not occuring, and that
the call of "input_set_events_per_packet(input, 60)" is taken into account
by input_register().

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=908604

Cc: stable@vger.kernel.org
Reported-and-Tested-By: Clarke Wixon <cwixon@usa.net>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

ALSA: hda - Enabling Realtek ALC 671 codec

* Added the device ID to the modalias list and assinged ALC662 patches
for it
* Added 4 port support for the device ID 0671 in alc662_parse_auto_config

Signed-off-by: Rainer Koenig <Rainer.Koenig@ts.fujitsu.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
"Another round of ARM fixes, which include:
   - Fixing a problem with LPAE mapping sections
   - Reporting of some hwcaps on Krait CPUs
   - Avoiding repetitive warnings in the breakpoint code
   - Fixing a build error noticed on Dove platforms with PJ4 CPUs
   - Fix masking of level 2 cache revision.
   - Fixing timer-based udelay()
   - A larger fix for an erratum causing people major grief with Cortex
     A15 CPUs"

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7690/1: mm: fix CONFIG_LPAE typos
  ARM: 7689/1: add unwind annotations to ftrace asm
  ARM: 7685/1: delay: use private ticks_per_jiffy field for timer-based delay ops
  ARM: 7684/1: errata: Workaround for Cortex-A15 erratum 798181 (TLBI/DSB operations)
  ARM: 7682/1: cache-l2x0: fix masking of RTL revision numbering and set_debug init
  ARM: iWMMXt: always enable iWMMXt support with PJ4 CPUs
  ARM: 7681/1: hw_breakpoint: use warn_once to avoid spam from reset_ctrl_regs()
  ARM: 7678/1: Work around faulty ISAR0 register in some Krait CPUs
  ARM: 7680/1: Detect support for SDIV/UDIV from ISAR0 register
  ARM: 7679/1: Clear IDIVT hwcap if CONFIG_ARM_THUMB=n
  ARM: 7677/1: LPAE: Fix mapping in alloc_init_section for unaligned addresses
  ARM: KVM: vgic: take distributor lock on sync_hwstate path
  ARM: KVM: vgic: force EOIed LRs to the empty state

PCI/PM: Disable runtime PM of PCIe ports

The runtime PM of PCIe ports turns out to be quite fragile, as in
some cases things work while in some other cases they don't and we
don't seem to have a good way to determine whether or not they are
going to work in advance.

For this reason, avoid enabling runtime PM for PCIe ports by
keeping their runtime PM reference counters always above 0 for the
time being.

When a PCIe port is suspended, it can no longer report events like
hotplug, so hotplug below the port may not work, as in the bug
report below.

[bhelgaas: changelog, stable]
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=53811
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.6+

PCI/ACPI: Always resume devices on ACPI wakeup notifications

It turns out that the _Lxx control methods provided by some BIOSes
clear the PME Status bit of PCI devices they handle, which means that
pci_acpi_wake_dev() cannot really use that bit to check whether or
not the device has signalled wakeup.

One symptom of the problem is, for example, that when an affected PCI
USB controller is runtime-suspended, then plugging in a new USB device
into one of the controller's ports will not wake up the controller,
which should happen.

For this reason, make pci_acpi_wake_dev() always attempt to resume
the device it is called for regardless of the device's PME Status bit
value (that bit still has to be cleared if set at this point,
though).

Reported-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
CC: stable@vger.kernel.org # v3.7+

Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"Unfortunately, we introduced some big-endian bugs during the last
  merge window.  Fortunately, Cai and Christian noticed before 3.9
  shipped."

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix big-endian bugs which could cause fs corruptions

Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem

Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs

Pull reiserfs fix from Jan Kara:
"A fix for reiserfs xattr bug exposed by changes to lookup_one_len()"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: Fix warning and inode leak when deleting inode with xattrs

Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
"Just a bunch of bugfixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: provide emtpy check_pgt_cache() function
  s390/uaccess: fix page table walk
  s390/3270: fix minor_start issue
  s390/uaccess: fix clear_user_pt()
  s390/scm_blk: fix error return code in scm_blk_init()
  s390/scm_block: fix printk format string
  drivers/Kconfig: add several missing GENERIC_HARDIRQS dependencies

ext4: fix big-endian bugs which could cause fs corruptions

When an extent was zeroed out, we forgot to do convert from cpu to le16.
It could make us hit a BUG_ON when we try to write dirty pages out.  So
fix it.

[ Also fix a bug found by Dmitry Monakhov where we were missing
  le32_to_cpu() calls in the new indirect punch hole code.

  There are a number of other big endian warnings found by static code
  analyzers, but we'll wait for the next merge window to fix them all
  up.  These fixes are designed to be Obviously Correct by code
  inspection, and easy to demonstrate that it won't make any
  difference (and hence, won't introduce any bugs) on little endian
  architectures such as x86.  --tytso ]

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: CAI Qian <caiqian@redhat.com>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>

ARM: 7690/1: mm: fix CONFIG_LPAE typos

CONFIG_LPAE doesn't exist: the correct option is CONFIG_ARM_LPAE, so fix
up the two typos under arch/arm/.

The fix to head.S is slightly scary, but this is just for setting up
an early io-mapping for the serial port when running on a big-endian,
LPAE system. Since these systems don't exist in the wild (at least, I
have no access to one outside of kvmtool, which doesn't provide a serial
port suitable for earlyprintk), then we can revisit the code later if it
causes any problems.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: 7689/1: add unwind annotations to ftrace asm

Add unwind annotations to the ftrace assembly code so that the function
tracer's stacktracing options (func_stack_trace, etc.) work when
CONFIG_ARM_UNWIND is enabled.

Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: 7685/1: delay: use private ticks_per_jiffy field for timer-based delay ops

Commit 70264367a243 ("ARM: 7653/2: do not scale loops_per_jiffy when
using a constant delay clock") fixed a problem with our timer-based
delay loop, where loops_per_jiffy is scaled by cpufreq yet used directly
by the timer delay ops.

This patch fixes the problem in a more elegant way by keeping a private
ticks_per_jiffy field in the delay ops, independent of loops_per_jiffy
and therefore not subject to scaling. The loop-based delay continues to
use loops_per_jiffy directly, as it should.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: 7684/1: errata: Workaround for Cortex-A15 erratum 798181 (TLBI/DSB operations)

On Cortex-A15 (r0p0..r3p2) the TLBI/DSB are not adequately shooting down
all use of the old entries. This patch implements the erratum workaround
which consists of:

1. Dummy TLBIMVAIS and DSB on the CPU doing the TLBI operation.
2. Send IPI to the CPUs that are running the same mm (and ASID) as the
one being invalidated (or all the online CPUs for global pages).
3. CPU receiving the IPI executes a DMB and CLREX (part of the exception
return code already).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: 7682/1: cache-l2x0: fix masking of RTL revision numbering and set_debug init

Commit b8db6b8 (ARM: 7547/4: cache-l2x0: add support for Aurora L2 cache
ctrl) moved the masking of the part ID which caused the RTL version to be
lost. Commit 6248d06 (ARM: 7545/1: cache-l2x0: make outer_cache_fns a
field of l2x0_of_data) changed how .set_debug is initialized. Both commits
break commit 74ddcdb (ARM: 7608/1: l2x0: Only set .set_debug
on PL310 r3p0 and earlier) which uses the RTL version to conditionally set
.set_debug function pointer. Commit b8db6b8 also caused the printed cache
ID to be missing the version information.

Fix this by reverting how the part number is masked so the RTL version
info is maintained. The cache-id-part DT property does not set the RTL
bits so masking them should have no effect. Also, re-arrange the order
of the function pointer init so the .set_debug function can be overridden.

Reported-by: Paolo Pisati <paolo.pisati@canonical.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Cc: Yehuda Yitschak <yehuday@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: iWMMXt: always enable iWMMXt support with PJ4 CPUs

Jason Cooper reports these build errors:
arch/arm/kernel/built-in.o: In function `iwmmxt_do':
/.../arch/arm/kernel/pj4-cp0.c:36: undefined reference to `iwmmxt_task_release'
/.../arch/arm/kernel/pj4-cp0.c:40: undefined reference to `iwmmxt_task_switch'
make: *** [vmlinux] Error 1

This is caused because the PJ4 code explicitly references the iWMMXt
code, but doesn't require it to be built. Fix this by ensuring that
iWMMXt is always enabled with PJ4.

Reported-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ALSA: usb: Work around CM6631 sample rate change bug

The C-Media CM6631 USB receiver doesn't respond to changes in sample rate
while the interface is active. The same behavior is observed in other UAC2
hardware like the VIA VT1731.

Reset the interface after setting the sampling frequency on sample rate
changes, to ensure that the sample rate set by snd_usb_init_sample_rate() is
used. Otherwise, the device will try to use the sample rate of the previous
stream, causing distorted sound on sample rate changes.

The reset is performed for all UAC2 devices, as it should not affect a
standards compliant device, but it is only necessary for C-Media CM6631,
VIA VT1731 and possibly others.

Failure to read sample rate from the device is not handled as an error in
set_sample_rate_v2(), as (permanent or intermittent) failure to read sample
rate isn't essential for a successful sample rate set.

Signed-off-by: Torstein Hegge <hegge@resisty.net>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ACPI / BGRT: Don't let users configure BGRT on non X86 systems

Fengguang Wu's 0-Day kernel build testing backend found the
following build error for an allmodconfig build on ia64:

   drivers/built-in.o: In function `show_yoffset':
>> bgrt.c:(.text+0xe5a71): undefined reference to `bgrt_tab'
>> bgrt.c:(.text+0xe5a91): undefined reference to `bgrt_tab'
   drivers/built-in.o: In function `show_xoffset':
>> bgrt.c:(.text+0xe5b51): undefined reference to `bgrt_tab'
>> bgrt.c:(.text+0xe5b71): undefined reference to `bgrt_tab'
   drivers/built-in.o: In function `show_type':
>> bgrt.c:(.text+0xe5c31): undefined reference to `bgrt_tab'
   drivers/built-in.o:bgrt.c:(.text+0xe5c51): more undefined references to `bgrt_tab' follow
   drivers/built-in.o: In function `bgrt_init':
   bgrt.c:(.init.text+0x8931): undefined reference to `bgrt_image'
   bgrt.c:(.init.text+0x8932): undefined reference to `bgrt_image_size'
   bgrt.c:(.init.text+0x8950): undefined reference to `bgrt_image'
   bgrt.c:(.init.text+0x8960): undefined reference to `bgrt_image_size'

The problem is that all these undefined names are provided by
arch/x86/platform/efi/efi-bgrt.c - which is obviously not available
to the ia64 build.

It doesn't seem useful to provide the BGRT support for Itanium
(many systems are headless and have no graphics at all). So
just don't let users configure this driver on non-X86 machines.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

netfilter: ip6t_NPT: Fix translation for non-multiple of 32 prefix lengths

The bitmask used for the prefix mangling was being calculated
incorrectly, leading to the wrong part of the address being replaced
when the prefix length wasn't a multiple of 32.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix VSOCK layer handling of context ID changes, from Reilly Grant.

2) Now that we have a synchronize_net() in netdev_rx_handler_unregister(),
    we can't let any call sites hold locks.  Unfortunately bonding does,
    so we have to drop the rwlock there a little bit earlier, fix from
    Veaceslav Falico.

3) MAC address setting loop exits one iteration too early in mlx4
    driver, from Yan Burman.

4) Restore ipv6 routes properly upon ifdown/ifup of loopback, from
    Balakumaran Kannan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  VSOCK: Handle changes to the VMCI context ID.
  net IPv6 : Fix broken IPv6 routing table after loopback down-up
  cbq: incorrect processing of high limits
  net/mlx4_en: Fix setting initial MAC address
  bonding: get netdev_rx_handler_unregister out of locks

Merge tag 'regmap-v3.9-rc4' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
"A small collection of fixes.  The most important ones are those from
  Stephen and Lars-Peter both of which fix cache issues that have been
  lurking for a while but not manifesting noticably enough for anyone to
  report them."

* tag 'regmap-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: async: Add missing return
  regmap: don't corrupt work buffer in _regmap_raw_write()
  regmap: cache Fix regcache-rbtree sync
  regmap: Initialize `map->debugfs' before regcache

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull DRM fixes from Dave Airlie:
"Two core fixes, both regressions, along with some intel and some
  nouveau fixes for regressions and oopses"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm: correctly restore mappings if drm_open fails
  drm/nouveau: fix NULL ptr dereference from nv50_disp_intr()
  drm/nouveau: fix handling empty channel list in ioctl's
  drm: don't unlock in the addfb error paths
  drm/i915: Fix build failure
  drm/i915: Be sure to turn hsync/vsync back on at crt enable (v2)
  drm/i915: duct-tape locking when eDP init fails

Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
"A collection of fixes pretty much across the MIPS code.  Even the
  change to include/linux/signal.h by David Howells' 2a1486981c13 ("Fix
  breakage in MIPS siginfo handling") should be considered MIPS-specific
  as it touches an ifdefed segment that is only relevant to MIPS and
  which unfortunately can't be made to go away entirely."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  Fix breakage in MIPS siginfo handling
  Revert "MIPS: BCM63XX: Call board_register_device from device_initcall()"
  MIPS: BCM63XX: Make nvram checksum failure non fatal
  MIPS: Fix code generation for non-DSP capable CPUs
  MIPS: Fix inconsistent formatting inside /proc/cpuinfo
  MIPS: SEAD3: Enable LL/SC.
  MIPS: Get rid of CONFIG_CPU_HAS_LLSC again
  MIPS: Add dependencies for HAVE_ARCH_TRANSPARENT_HUGEPAGE
  MIPS: VR4133: Fix probe for LL/SC.
  MIPS: Fix logic errors in bitops.c
  MIPS: Use CONFIG_CPU_MIPSR2 in csum_partial.S
  MIPS: compat: Return same error ENOSYS as native for invalid operation.

PCI: Don't try to disable Bus Master on disconnected PCI devices

This is a fix for commit 7897e60227 ("PCI: Disable Bus Master
unconditionally in pci_device_shutdown()"). Vivek reported that
with this commit, kexec failed because none of his SATA disks
came up.

A ->shutdown() callback might put the device in D3cold, which means config
space is no longer available.

[bhelgaas: changelog]
Link: https://lkml.org/lkml/2013/3/12/529
Reported-and-Tested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"

This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.

Conflicts:
drivers/acpi/pci_root.c

This commit broke some pre-1.1 PCIe devices by leaving them with
ASPM enabled.  Previously, we had disabled ASPM on these devices
because many of them don't implement it correctly (per 149e1637).

Requesting _OSC control early means that aspm_disabled may be set
before we scan the PCI bus and configure link ASPM state.  But the
ASPM configuration currently skips the check for pre-PCIe 1.1 devices
when aspm_disabled is set, like this:

    acpi_pci_root_add
      acpi_pci_osc_support
        if (flags != base_flags)
          pcie_no_aspm
            aspm_disabled = 1
      pci_acpi_scan_root
        ...
          pcie_aspm_init_link_state
            pcie_aspm_sanity_check
              if (!aspm_disabled)
                /* check for pre-PCIe 1.1 device */

Therefore, setting aspm_disabled early means that we leave ASPM enabled
on these pre-PCIe 1.1 devices, which is a regression for some devices.

The best fix would be to clean up the ASPM init so we can evaluate
_OSC before scanning the bug (that way boot-time and hot-add discovery
will work the same), but that requires significant rework.

For now, we'll just revert the _OSC change as the lowest-risk fix.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
CC: stable@vger.kernel.org # v3.8+

Merge branch 'pci/yinghai-eisa' into for-linus

* pci/yinghai-eisa:
EISA/PCI: Init EISA early, before PNP
EISA/PCI: Fix bus res reference

x86: remove the x32 syscall bitmask from syscall_get_nr()

Commit fca460f95e928bae373daa8295877b6905bc62b8 simplified the x32
implementation by creating a syscall bitmask, equal to 0x40000000, that
could be applied to x32 syscalls such that the masked syscall number
would be the same as a x86_64 syscall.  While that patch was a nice
way to simplify the code, it went a bit too far by adding the mask to
syscall_get_nr(); returning the masked syscall numbers can cause
confusion with callers that expect syscall numbers matching the x32
ABI, e.g. unmasked syscall numbers.

This patch fixes this by simply removing the mask from syscall_get_nr()
while preserving the other changes from the original commit.  While
there are several syscall_get_nr() callers in the kernel, most simply
check that the syscall number is greater than zero, in this case this
patch will have no effect.  Of those remaining callers, they appear
to be few, seccomp and ftrace, and from my testing of seccomp without
this patch the original commit definitely breaks things; the seccomp
filter does not correctly filter the syscalls due to the difference in
syscall numbers in the BPF filter and the value from syscall_get_nr().
Applying this patch restores the seccomp BPF filter functionality on
x32.

I've tested this patch with the seccomp BPF filters as well as ftrace
and everything looks reasonable to me; needless to say general usage
seemed fine as well.

Signed-off-by: Paul Moore <pmoore@redhat.com>
Link: http://lkml.kernel.org/r/20130215172143.12549.10292.stgit@localhost
Cc: <stable@vger.kernel.org>
Cc: Will Drewry <wad@chromium.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

drm: correctly restore mappings if drm_open fails

If first drm_open fails, the error-handling path will
incorrectly restore inode's mapping to NULL. This can
cause the crash later on. Fix by separately storing
away mapping pointers that drm_open can touch and
restore each from its own respective variable if the
call fails.

Fixes: https://bugzilla.novell.com/show_bug.cgi?id=807850
(thanks to Michal Hocko for investigating investigating and
finding the root cause of the bug)

Reference:
http://lists.freedesktop.org/archives/dri-devel/2013-March/036564.html

v2: Use one variable to store file and inode mapping
since they are the same at the function entry.
Fix spelling mistakes in commit message.

v3: Add reference to the original bug report.

Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>
Tested-by: Marco Munderloh <munderl@tnt.uni-hannover.de>
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'drm-nouveau-fixes-3.9' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next

Oops fixers.
* 'drm-nouveau-fixes-3.9' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau: fix NULL ptr dereference from nv50_disp_intr()
drm/nouveau: fix handling empty channel list in ioctl's

Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next

One locking regression fix, and a couple of other i915 ones.

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm: don't unlock in the addfb error paths
  drm/i915: Fix build failure
  drm/i915: Be sure to turn hsync/vsync back on at crt enable (v2)
  drm/i915: duct-tape locking when eDP init fails

VSOCK: Handle changes to the VMCI context ID.

The VMCI context ID of a virtual machine may change at any time. There
is a VMCI event which signals this but datagrams may be processed before
this is handled. It is therefore necessary to be flexible about the
destination context ID of any datagrams received. (It can be assumed to
be correct because it is provided by the hypervisor.) The context ID on
existing sockets should be updated to reflect how the hypervisor is
currently referring to the system.

Signed-off-by: Reilly Grant <grantr@vmware.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net IPv6 : Fix broken IPv6 routing table after loopback down-up

IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
interface. After down-up, routes of other interface's IPv6 addresses through
'lo' are lost.

IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
communication. Once 'lo' is down, those routing entries are removed from routing
table. But those removed entries are not being re-created properly when 'lo' is
brought up. So IPv6 addresses of other interfaces becomes unreachable from the
same machine. Also this breaks communication with other machines because of
NDISC packet processing failure.

This patch fixes this issue by reading all interface's IPv6 addresses and adding
them to IPv6 routing table while bringing up 'lo'.

==Testing==
Before applying the patch:
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

After applying the patch:
$ route -A inet6
Kernel IPv6 routing
table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cbq: incorrect processing of high limits

currently cbq works incorrectly for limits > 10% real link bandwidth,
and practically does not work for limits > 50% real link bandwidth.
Below are results of experiments taken on 1 Gbit link

In shaper | Actual Result
-----------+---------------
  100M     | 108 Mbps
  200M     | 244 Mbps
  300M     | 412 Mbps
  500M     | 893 Mbps

This happen because of q->now changes incorrectly in cbq_dequeue():
when it is called before real end of packet transmitting,
L2T is greater than real time delay, q_now gets an extra boost
but never compensate it.

To fix this problem we prevent change of q->now until its synchronization
with real time.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipc: set msg back to -EAGAIN if copy wasn't performed

Make sure that msg pointer is set back to error value in case of
MSG_COPY flag is set and desired message to copy wasn't found. This
garantees that msg is either a error pointer or a copy address.

Otherwise the last message in queue will be freed without unlinking from
the queue (which leads to memory corruption) and the dummy allocated
copy won't be released.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

net/mlx4_en: Fix setting initial MAC address

Commit 6bbb6d9 "net/mlx4_en: Optimize Rx fast path filter checks" introduced a regression
under which the MAC address read from the card was not converted correctly
(the most significant byte was not handled), fix that.

Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: get netdev_rx_handler_unregister out of locks

Now that netdev_rx_handler_unregister contains synchronize_net(), we need
to call it outside of bond->lock, cause it might sleep. Also, remove the
already unneded synchronize_net().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC bug fixes from Arnd Bergmann:
"After a quiet set of fixes for 3.9-rc4, a lot of people woke up and
  sent urgent fixes for 3.9.  I pushed back on a number of them that got
  deferred to 3.10, but these are the ones that seemed important.

  Regression in 3.9:

   - Multiple regressions in OMAP2+ clock cleanup
   - SH-Mobile frame buffer bug fix that merged here because of
     maintainer MIA
   - ux500 prcmu changes broke DT booting
   - MMCI duplicated regulator setup on ux500
   - New ux500 clock driver broke ethernet on snowball
   - Local interrupt driver for mvebu broke ethernet
   - MVEBU GPIO driver did not get set up right on Orion DT
   - incorrect interrupt number on Orion crypto for DT

  Long-standing bugs, including candidates for stable:

   - Kirkwood MMC needs to disable invalid card detect pins
   - MV SDIO pinmux was wrong on Mirabox
   - GoFlex Net board file needs to set NAND chip delay
   - MSM timer restart race
   - ep93xx early debug code broke in 3.7
   - i.MX CPU hotplug race
   - Incorrect clock setup for OMAP1 USB
   - Workaround for bad clock setup by some old OMAP4 boot loaders
   - Static I/O mappings on cns3xxx since 3.2"

* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: cns3xxx: fix mapping of private memory region
  arm: mvebu: Fix pinctrl for Armada 370 Mirabox SDIO port.
  arm: orion5x: correct IRQ used in dtsi for mv_cesa
  arm: orion5x: fix orion5x.dtsi gpio parameters
  ARM: Kirkwood: fix unused mvsdio gpio pins
  arm: mvebu: Use local interrupt only for the timer 0
  ARM: kirkwood: Fix chip-delay for GoFlex Net
  ARM: ux500: Enable the clock controlling Ethernet on Snowball
  ARM: ux500: Stop passing ios_handler() as an MMCI power controlling call-back
  ARM: ux500: Apply the TCPM and TCDM locations and sizes to dbx5x0 DT
  fbdev: sh_mobile_lcdc: fixup B side hsync adjust settings
  ARM: OMAP: clocks: Delay clk inits atleast until slab is initialized
  ARM: imx: fix sync issue between imx_cpu_die and imx_cpu_kill
  ARM: msm: Stop counting before reprogramming clockevent
  ARM: ep93xx: Fix wait for UART FIFO to be empty
  ARM: OMAP4: PM: fix PM regression introduced by recent clock cleanup
  ARM: OMAP3: hwmod data: keep MIDLEMODE in force-standby for musb
  ARM: OMAP4: clock data: lock USB DPLL on boot
  ARM: OMAP1: fix USB host on 1710

Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfix from J Bruce Fields:
"An xdr decoding error--thanks, Toralf Förster, and Trinity!"

* 'for-3.9' of git://linux-nfs.org/~bfields/linux:
nfsd4: reject "negative" acl lengths

Merge tag 'v3.9-rc1_cns3xxx_fixes' of git://git.infradead.org/users/cbou/linux-cns3xxx into fixes

From Anton Vorontsov <anton@enomsg.org>:

This tag includes Mac Lin's work to revive CNS3xxx booting:

"Since commit 0536bdf33faf (ARM: move iotable mappings within the vmalloc
region), [...] the pre-defined iotable mappings is not in the vmalloc
region. [...] move the iotable mappings into the vmalloc region, and
merge the MPCore private memory region (containing the SCU, the GIC and
the TWD) as a single region."

Plus there is a small cosmetic fix, also from Mac Lin.

* tag 'v3.9-rc1_cns3xxx_fixes' of git://git.infradead.org/users/cbou/linux-cns3xxx:
ARM: cns3xxx: fix mapping of private memory region

[arnd: dropped the cosmetic fix from the merge as it is not needed for 3.9]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

cpuidle / ACPI: recover percpu ACPI processor cstate

Commit ac3ebafa81af76d6 "ACPI / idle: remove usage of the statedata"
changed the percpu processor cstate to a unified cstate in ACPI idle.
That caused all our NHM boxes to boot hang or panic.

2178751 Task dump for CPU 1:
2178752 swapper/1       R  running task     6736     0      1 0x00000000
2178753  ffff8801e8029dc8 ffffffff8101cf96 ffff8801e8029e28 ffffffff813d294b
2178754  0000000000000f99 0000000000000003 00000000003cf654 0000000025c17d03
2178755  ffff8801e8029e38 ffff8801e74fc000 00000002590dc5c4 ffffffff8163cdb0
2178756 Call Trace:
2178757  [<ffffffff8101cf96>] ? acpi_processor_ffh_cstate_enter+0x2d/0x2f
2178758  [<ffffffff813d294b>] acpi_idle_enter_bm+0x1b1/0x236
2178759  [<ffffffff8163cdb0>] ? disable_cpuidle+0x10/0x10
2178760  [<ffffffff8163cdc2>] cpuidle_enter+0x12/0x14
2178761  [<ffffffff8163d286>] cpuidle_wrap_enter+0x2f/0x6d
2178762  [<ffffffff8163d2d4>] cpuidle_enter_tk+0x10/0x12
2178763  [<ffffffff8163cdd6>] cpuidle_enter_state+0x12/0x3a
2178764  [<ffffffff8163d4a7>] cpuidle_idle_call+0xe8/0x161
2178765  [<ffffffff81008d99>] cpu_idle+0x5e/0xa4
2178766  [<ffffffff8174c6c1>] start_secondary+0x1a9/0x1ad
2178767 Task dump for CPU 2:

In fact, the ACPI idle is based on the assumption of difference percpu
cstate structures that are necessary for the implementation to work
cprrectly.  A unique acpi_processor_cx is not sifficient by far.

This patch is just a quick fix re-introducing the percpu cstates.

If someone really wants to unify the ACPI cstates, please make sure
that the whole software infrastructure is changed and take hardware
as well as many different kinds of BIOS settings into account.

[rjw: Changelog]
Reported-by: LKP project <lkp@linux.intel.com>
Reported-by: Xie ChanglongX <changlongx.xie@intel.com>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI / I2C: Use parent's ACPI_HANDLE() in acpi_i2c_register_devices()

The ACPI handle of struct i2c_adapter's dev member should not be
set, because this causes that struct i2c_adapter to be associated
with the ACPI device node corresponding to its parent as the
second "physical_device", which is incorrect (this happens during
the registration of struct i2c_adapter). Consequently,
acpi_i2c_register_devices() should use the ACPI handle of the
parent of the struct i2c_adapter it is called for rather than the
struct i2c_adapter's ACPI handle (which should be NULL).

Make that happen and modify the i2c-designware-platdrv driver,
which currently is the only driver for ACPI-enumerated I2C
controller chips, not to set the ACPI handle for the
struct i2c_adapter it creates.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Wolfram Sang <wsa@the-dreams.de>