git.openwrt.org Git - openwrt/staging/blogic.git/log

net/mlx4: Don't disable vxlan offloads under DMFS-A0 optimized steering

Except for VXLAN steering rules, all offloads should work as they were
under plain DMFS mode. Fix that by enabling all the offloads under
DMFS-A0 mode, except for VXLAN steering rules.

Fixes: d57febe1a478 "net/mlx4: Add A0 hybrid steering"
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-for-davem-2015-01-15' of git://git./linux/kernel/git/jberg/mac80211

Just two fixes - one for an uninialized variable and
one for a deadlock in regulatory processing.

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sxgbe: Fix waring for double kfree()

This patch fixes double kfree() calls at init_rx_ring() because
it causes static checker warning.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Byungho An <bh74.an@samsung.com>
Signed-off-by: Kukjin Kim <kgene@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sxgbe: Fix NULL dereferece when using DT

When the MAC address is provided in the device tree file, the
condition is true and kernel crashes due to NULL dereference.

Signed-off-by: Girish K.S <ks.giri@samsung.com>
Signed-off-by: Byungho An <bh74.an@samsung.com>
Signed-off-by: Kukjin Kim <kgene@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: Fix addition of .trscer_err_mask to wrong SoC data

commit b284fbe3b3ef9cf8 ("sh_eth: Fix access to TRSCER register") wanted
to add a .trscer_err_mask value to the R-Car Gen2 family-specific data
structure (r8a779x_data), but it was accidentally added to the
SH7724-specific data structure (sh7724_data).

Presumably this happened due to a patch conflict with commit
d407bc0203539031 ("sh-eth: Set fdr_value of R-Car SoCs"), which added
another field at the same position.

Move the field setting to fix this.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: b284fbe3b3ef9cf8 ("sh_eth: Fix access to TRSCER register")
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: cpsw: fix cpsw hung with add vlan using vconfig

while adding vlan in dual EMAC mode, only specific ports should be
subscribed for the vlan, else it will lead to switching mode and
if both ports connected to same switch cpsw will hung as it creates
a network loop. Fixing this by adding only specific ports in case
of dual EMAC.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Allow GRE to work concurrently while a VxLAN tunnel is configured

Other tunnels like GRE break while VxLAN offloads are enabled in Skyhawk-R. To
avoid this, we should restrict offload features on a per-packet basis in such
conditions.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Don't use uninitialized data in IPVS, from Dan Carpenter.

2) conntrack race fixes from Pablo Neira Ayuso.

3) Fix TX hangs with i40e, from Jesse Brandeburg.

4) Fix budget return from poll calls in dnet and alx, from Eric
    Dumazet.

5) Fix bugus "if (unlikely(x) < 0)" test in AF_PACKET, from Christoph
    Jaeger.

6) Fix bug introduced by conversion to list_head in TIPC retransmit
    code, from Jon Paul Maloy.

7) Don't use GFP_NOIO under spinlock in USB kaweth driver, from Alexey
    Khoroshilov.

8) Fix bridge build with INET disabled, from Arnd Bergmann.

9) Fix netlink array overrun for PROBE attributes in openvswitch, from
    Thomas Graf.

10) Don't hold spinlock across synchronize_irq() in tg3 driver, from
    Prashant Sreedharan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  tg3: Release tp->lock before invoking synchronize_irq()
  tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
  tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
  team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
  openvswitch: packet messages need their own probe attribtue
  i40e: adds FCoE configure option
  cxgb4vf: Fix queue allocation for 40G adapter
  netdevice: Add missing parentheses in macro
  bridge: only provide proxy ARP when CONFIG_INET is enabled
  neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
  net: fec: fix MDIO bus assignement for dual fec SoC's
  xen-netfront: use different locks for Rx and Tx stats
  drivers: net: cpsw: fix multicast flush in dual emac mode
  cxgb4vf: Initialize mdio_addr before using it
  net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations
  usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()
  MAINTAINERS: add me as ibmveth maintainer
  tipc: fix bug in broadcast retransmit code
  update ip-sysctl.txt documentation (v2)
  net/at91_ether: prepare and unprepare clock
  ...

Merge branch 'tg3-net'

Prashant Sreedharan says:

====================
tg3: synchronize_irq() should be called without taking locks

v2: Added Reported-by, Tested-by fields and reference to the thread that
reported the problem

This series addresses the problem reported by Peter Hurley in mail thread
https://lkml.org/lkml/2015/1/12/1082
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Release tp->lock before invoking synchronize_irq()

synchronize_irq() can sleep waiting, for pending IRQ handlers so driver
should release the tp->lock spin lock before invoking synchronize_irq()

Reported-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: tg3_reset_task() needs to use rtnl_lock to synchronize

Currently tg3_reset_task() uses only tp->lock for synchronizing with code
paths like tg3_open() etc. But since tp->lock is released before doing
synchronize_irq(), rtnl_lock should be taken in tg3_reset_task() to
synchronize it with other code paths.

Reported-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync

This is to avoid the race between tg3_timer() and the execution paths
which does not invoke tg3_timer_stop() and releases tp->lock before
calling synchronize_irq()

Reported-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"Two bugfixes for arm64.  I will have another pull request next week,
  but otherwise things are calm"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  arm64: KVM: Fix HCR setting for 32bit guests
  arm64: KVM: Fix TLB invalidation by IPA/VMID

team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin

This patch is fixing a race condition that may cause setting
count_pending to -1, which results in unwanted big bulk of arp messages
(in case of "notify peers").

Consider following scenario:

count_pending == 2
   CPU0                                           CPU1
team_notify_peers_work
  atomic_dec_and_test (dec count_pending to 1)
  schedule_delayed_work
team_notify_peers
   atomic_add (adding 1 to count_pending)
team_notify_peers_work
  atomic_dec_and_test (dec count_pending to 1)
  schedule_delayed_work
team_notify_peers_work
  atomic_dec_and_test (dec count_pending to 0)
   schedule_delayed_work
team_notify_peers_work
  atomic_dec_and_test (dec count_pending to -1)

Fix this race by using atomic_dec_if_positive - that will prevent
count_pending running under 0.

Fixes: fc423ff00df3a1955441 ("team: add peer notification")
Fixes: 492b200efdd20b8fcfd ("team: add support for sending multicast rejoins")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
"Two small performance tweaks, the plumbing for the execveat system
  call and a couple of bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/uprobes: fix user space PER events
  s390/bpf: Fix JMP_JGE_X (A > X) and JMP_JGT_X (A >= X)
  s390/bpf: Fix ALU_NEG (A = -A)
  s390/mm: avoid using pmd_to_page for !USE_SPLIT_PMD_PTLOCKS
  s390/timex: fix get_tod_clock_ext() inline assembly
  s390: wire up execveat syscall
  s390/kernel: use stnsm 255 instead of stosm 0
  s390/vtime: Get rid of redundant WARN_ON
  s390/zcrypt: kernel oops at insmod of the z90crypt device driver

openvswitch: packet messages need their own probe attribtue

User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.

Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining to be binary compatible with existing OVS binaries.

Fixes: 05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tracked-down-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: adds FCoE configure option

Adds FCoE config option I40E_FCOE, so that FCoE can be enabled
as needed but otherwise have it disabled by default.

This also eliminate multiple FCoE config checks, instead now just
one config check for CONFIG_I40E_FCOE.

The I40E FCoE was added with 3.17 kernel and therefore this patch
shall be applied to stable 3.17 kernel also.

CC: <stable@vger.kernel.org>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4vf: Fix queue allocation for 40G adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'locks-v3.19-1' of git://git.samba.org/jlayton/linux

Pull file locking fix from Jeff Layton:
"Just a simple bugfix for a regression that I introduced into v3.18
  with the internal lease API overhaul -- mea culpa.  Kudos to Linda and
  Neil for tracking this down and fixing it"

* tag 'locks-v3.19-1' of git://git.samba.org/jlayton/linux:
  locks: fix NULL-deref in generic_delete_lease

netdevice: Add missing parentheses in macro

For example, one could conceivably call
for_each_netdev_in_bond_rcu(condition ? bond1 : bond2, slave)
and get an unexpected result.

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block layer fixes from Jens Axboe:
"The major part is an update to the NVMe driver, fixing various issues
  around surprise removal and hung controllers.  Most of that is from
  Keith, and parts are simple blk-mq fixes or exports/additions of minor
  functions to aid this effort, and parts are changes directly to the
  NVMe driver.

  Apart from the above, this contains:

   - Small blk-mq change from me, killing an unused member of the
     hardware queue structure.

   - Small fix from Ming Lei, fixing up a few drivers that didn't
     properly check for ERR_PTR() returns from blk_mq_init_queue()"

* 'for-linus' of git://git.kernel.dk/linux-block:
  NVMe: Fix locking on abort handling
  NVMe: Start and stop h/w queues on reset
  NVMe: Command abort handling fixes
  NVMe: Admin queue removal handling
  NVMe: Reference count admin queue usage
  NVMe: Start all requests
  blk-mq: End unstarted requests on a dying queue
  blk-mq: Allow requests to never expire
  blk-mq: Add helper to abort requeued requests
  blk-mq: Let drivers cancel requeue_work
  blk-mq: Export if requests were started
  blk-mq: Wake tasks entering queue on dying
  blk-mq: get rid of ->cmd_size in the hardware queue
  block: fix checking return value of blk_mq_init_queue
  block: wake up waiters when a queue is marked dying
  NVMe: Fix double free irq
  blk-mq: Export freeze/unfreeze functions
  blk-mq: Exit queue on alloc failure

bridge: only provide proxy ARP when CONFIG_INET is enabled

When IPV4 support is disabled, we cannot call arp_send from
the bridge code, which would result in a kernel link error:

net/built-in.o: In function `br_handle_frame_finish':
:(.text+0x59914): undefined reference to `arp_send'
:(.text+0x59a50): undefined reference to `arp_tbl'

This makes the newly added proxy ARP support in the bridge
code depend on the CONFIG_INET symbol and lets the compiler
optimize the code out to avoid the link error.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP")
Cc: Kyeyoon Park <kyeyoonp@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

neighbour: fix base_reachable_time(_ms) not effective immediatly when changed

When setting base_reachable_time or base_reachable_time_ms on a
specific interface through sysctl or netlink, the reachable_time
value is not updated.

This means that neighbour entries will continue to be updated using the
old value until it is recomputed in neigh_period_work (which
recomputes the value every 300*HZ).
On systems with HZ equal to 1000 for instance, it means 5mins before
the change is effective.

This patch changes this behavior by recomputing reachable_time after
each set on base_reachable_time or base_reachable_time_ms.
The new value will become effective the next time the neighbour's timer
is triggered.

Changes are made in two places: the netlink code for set and the sysctl
handling code. For sysctl, I use a proc_handler. The ipv6 network
code does provide its own handler but it already refreshes
reachable_time correctly so it's not an issue.
Any other user of neighbour which provide its own handlers must
refresh reachable_time.

Signed-off-by: Jean-Francois Remy <jeff@melix.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: fix MDIO bus assignement for dual fec SoC's

On i.MX28, the MDIO bus is shared between the two FEC instances.
The driver makes sure that the second FEC uses the MDIO bus of the
first FEC. This is done conditionally if FEC_QUIRK_ENET_MAC is set.
However, in newer designs, such as Vybrid or i.MX6SX, each FEC MAC
has its own MDIO bus. Simply removing the quirk FEC_QUIRK_ENET_MAC
is not an option since other logic, triggered by this quirk, is
still needed.

Furthermore, there are board designs which use the same MDIO bus
for both PHY's even though the second bus would be available on the
SoC side. Such layout are popular since it saves pins on SoC side.
Due to the above quirk, those boards currently do work fine. The
boards in the mainline tree with such a layout are:
- Freescale Vybrid Tower with TWR-SER2 (vf610-twr.dts)
- Freescale i.MX6 SoloX SDB Board (imx6sx-sdb.dts)

This patch adds a new quirk FEC_QUIRK_SINGLE_MDIO for i.MX28, which
makes sure that the MDIO bus of the first FEC is used in any case.

However, the boards above do have a SoC with a MDIO bus for each FEC
instance. But the PHY's are not connected in a 1:1 configuration. A
proper device tree description is needed to allow the driver to
figure out where to find its PHY. This patch fixes that shortcoming
by adding a MDIO bus child node to the first FEC instance, along
with the two PHY's on that bus, and making use of the phy-handle
property to add a reference to the PHY's.

Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'leds-fixes-for-3.19' of git://git./linux/kernel/git/cooloney/linux-leds

Pull LED fix from Bryan Wu.

* 'leds-fixes-for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
leds: netxbig: fix oops at probe time

Merge tag 'for-linus' of git://git./linux/kernel/git/borntraeger/linux

Pull WRITE_ONCE argument order change from Christian Borntraeger:
"As discussed on LKML[1] it was agreed that WRITE_ONCE(x, val) is
  better than ASSIGN_ONCE(val, x)

  Lets change that for 3.19 as 3.19 has no user yet, but the first users
  will hit linux-next soon"

[1] http://marc.info/?l=linux-kernel&m=142081181707596

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux:
  kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)

xen-netfront: use different locks for Rx and Tx stats

In netfront the Rx and Tx path are independent and use different
locks.  The Tx lock is held with hard irqs disabled, but Rx lock is
held with only BH disabled.  Since both sides use the same stats lock,
a deadlock may occur.

  [ INFO: possible irq lock inversion dependency detected ]
  3.16.2 #16 Not tainted
  ---------------------------------------------------------
  swapper/0/0 just changed the state of lock:
   (&(&queue->tx_lock)->rlock){-.....}, at: [<c03adec8>]
  xennet_tx_interrupt+0x14/0x34
  but this lock took another, HARDIRQ-unsafe lock in the past:
   (&stat->syncp.seq#2){+.-...}
  and interrupts could create inverse lock ordering between them.
  other info that might help us debug this:
   Possible interrupt unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&stat->syncp.seq#2);
                                 local_irq_disable();
                                 lock(&(&queue->tx_lock)->rlock);
                                 lock(&stat->syncp.seq#2);
    <Interrupt>
      lock(&(&queue->tx_lock)->rlock);

Using separate locks for the Rx and Tx stats fixes this deadlock.

Reported-by: Dmitry Piotrovsky <piotrovskydmitry@gmail.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: cpsw: fix multicast flush in dual emac mode

Since ALE table is a common resource for both the interfaces in Dual EMAC
mode and while bringing up the second interface in cpsw_ndo_set_rx_mode()
all the multicast entries added by the first interface is flushed out and
only second interface multicast addresses are added. Fixing this by
flushing multicast addresses based on dual EMAC port vlans which will not
affect the other emac port multicast addresses.

Fixes: d9ba8f9 (driver: net: ethernet: cpsw: dual emac interface implementation)
Cc: <stable@vger.kernel.org> # v3.9+
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

leds: netxbig: fix oops at probe time

This patch fixes a NULL pointer dereference on led_dat->mode_val. Due to
this bug, a kernel oops can be observed at probe time on the LaCie 2Big
and 5Big v2 boards:

Unable to handle kernel NULL pointer dereference at virtual address 00000008
[...]
[<c03f244c>] (netxbig_led_probe) from [<c02c8c6c>] (platform_drv_probe+0x4c/0x9c)
[<c02c8c6c>] (platform_drv_probe) from [<c02c72d0>] (driver_probe_device+0x98/0x25c)
[<c02c72d0>] (driver_probe_device) from [<c02c7520>] (__driver_attach+0x8c/0x90)
[<c02c7520>] (__driver_attach) from [<c02c5c24>] (bus_for_each_dev+0x68/0x94)
[<c02c5c24>] (bus_for_each_dev) from [<c02c6408>] (bus_add_driver+0x124/0x1dc)
[<c02c6408>] (bus_add_driver) from [<c02c7ac0>] (driver_register+0x78/0xf8)
[<c02c7ac0>] (driver_register) from [<c000888c>] (do_one_initcall+0x80/0x1cc)
[<c000888c>] (do_one_initcall) from [<c0733618>] (kernel_init_freeable+0xe4/0x1b4)
[<c0733618>] (kernel_init_freeable) from [<c058db9c>] (kernel_init+0xc/0xec)
[<c058db9c>] (kernel_init) from [<c0009850>] (ret_from_fork+0x14/0x24)
[...]

This bug was introduced by commit 588a6a99286ae30afb1339d8bc2163517b1b7dd1
("leds: netxbig: fix attribute-creation race").

Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Cc: <stable@vger.kernel.org> # 3.17+
Acked-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Bryan Wu <cooloney@gmail.com>

cxgb4vf: Initialize mdio_addr before using it

In commit 5ad24def21b205a8 ("cxgb4vf: Fix ethtool get_settings for VF driver")
mdio_addr of port_info structure was used unininitialzed. Fixing it.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)

Feedback has shown that WRITE_ONCE(x, val) is easier to use than
ASSIGN_ONCE(val,x).
There are no in-tree users yet, so lets change it for 3.19.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Merge tag 'linux-kselftest-3.19-rc-5' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
"This update contains three patches to fix one compile error, and two
  run-time bugs.  One of them fixes infinite loop on ARM"

* tag 'linux-kselftest-3.19-rc-5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/vm: fix link error for transhuge-stress test
  tools: testing: selftests: mq_perf_tests: Fix infinite loop on ARM
  selftests/exec: allow shell return code of 126

Merge tag 'stable/for-linus-3.19-rc4-tag' of git://git./linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:
"Several critical linear p2m fixes that prevented some hosts from
  booting"

* tag 'stable/for-linus-3.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: properly retrieve NMI reason
  xen: check for zero sized area when invalidating memory
  xen: use correct type for physical addresses
  xen: correct race in alloc_p2m_pmd()
  xen: correct error for building p2m list on 32 bits
  x86/xen: avoid freeing static 'name' when kasprintf() fails
  x86/xen: add extra memory for remapped frames during setup
  x86/xen: don't count how many PFNs are identity mapped
  x86/xen: Free bootmem in free_p2m_page() during early boot
  x86/xen: Remove unnecessary BUG_ON(preemptible()) in xen_setup_timer()

net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations

Corrected the comment describing the ndo operations to
reflect the actual prototype for couple of operations

Signed-off-by: B Viswanath <marichika4@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-rc' of git://git./linux/kernel/git/rzhang/linux

Pull thermal management fixes from Zhang Rui:
"Specifics:

   - Fix a problem that Intel SoC DTS thermal driver does not work when
     CONFIG_THERMAL_INT340X is not set.

   - Fix a NULL pointer dereference when processor_thermal_device driver
     is loaded on a platform without ACPI support"

* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  int340x_thermal/processor_thermal_device: return failure when
  ACPI/int340x_thermal: enumerate INT3401 for Intel SoC DTS thermal driver
  ACPI/int340x_thermal: enumerate INT340X devices even if they're not in _ART/_TRT

locks: fix NULL-deref in generic_delete_lease

commit 0efaa7e82f02fe69c05ad28e905f31fc86e6f08e
locks: generic_delete_lease doesn't need a file_lock at all

moves the call to fl->fl_lmops->lm_change() to a place in the
code where fl might be a non-lease lock.
When that happens, fl_lmops is NULL and an Oops ensures.

So add an extra test to restore correct functioning.

Reported-by: Linda Walsh <suse@tlinx.org>
Link: https://bugzilla.suse.com/show_bug.cgi?id=912569
Cc: stable@vger.kernel.org (v3.18)
Fixes: 0efaa7e82f02fe69c05ad28e905f31fc86e6f08e
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>

x86/xen: properly retrieve NMI reason

Using the native code here can't work properly, as the hypervisor would
normally have cleared the two reason bits by the time Dom0 gets to see
the NMI (if passed to it at all). There's a shared info field for this,
and there's an existing hook to use - just fit the two together. This
is particularly relevant so that NMIs intended to be handled by APEI /
GHES actually make it to the respective handler.

Note that the hook can (and should) be used irrespective of whether
being in Dom0, as accessing port 0x61 in a DomU would be even worse,
while the shared info field would just hold zero all the time. Note
further that hardware NMI handling for PVH doesn't currently work
anyway due to missing code in the hypervisor (but it is expected to
work the native rather than the PV way).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Merge tag 'gpio-v3.19-3' of git://git./linux/kernel/git/linusw/linux-gpio

Pull gpio fixes from Linus Walleij:
"Here are some GPIO fixes, mainly affecting the DLN2 IRQ handling.
  Nothing special about them, just fixes:

   - Three patches fixing IRQ handling for the DLN2
   - Null pointer handling for grgpio"

* tag 'gpio-v3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: dln2: use bus_sync_unlock instead of scheduling work
  gpio: grgpio: Avoid potential NULL pointer dereference
  gpio: dln2: Fix gpio output value in dln2_gpio_direction_output()
  gpio: dln2: fix issue when an IRQ is unmasked then enabled

Merge tag 'mmc-v3.19-3' of git://git.linaro.org/people/ulf.hansson/mmc

Pull MMC fixes from Ulf Hansson:
"MMC host:
   - sdhci-pci|acpi: Support some new IDs
   - sdhci: Fix sleep from atomic context
   - sdhci-pxav3: Prevent hang during ->probe()
   - sdhci: Disable re-tuning for HS400"

* tag 'mmc-v3.19-3' of git://git.linaro.org/people/ulf.hansson/mmc:
  mmc: sdhci-pci: Add support for Intel SPT
  mmc: sdhci-acpi: Add ACPI HID INT344D
  mmc: sdhci: Fix sleep in atomic after inserting SD card
  mmc: sdhci-pxav3: do the mbus window configuration after enabling clocks
  mmc: sdhci: Disable re-tuning for HS400
  mmc: sdhci: Simplify use of tuning timer
  mmc: sdhci: Add out_unlock to sdhci_execute_tuning
  mmc: sdhci: Tuning should not change max_blk_count

Merge git://git./linux/kernel/git/nab/target-pending

Pull scsi target fixes from Nicholas Bellinger:
"Mostly minor fixes this time, including:

   - Add missing virtio-scsi -> TCM attribute conversion in vhost-scsi.
   - Fix persistent reservations write exclusive handling to allow
     readers for all registered I_T nexuses.
   - Drop arbitrary maximum I/O size limit in order to process I/Os
     larger than 4 MB, required for initiators that don't honor block
     limits EVPD.
   - Drop the now left-over fabric_max_sectors attribute"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iscsi-target: Fix typos in enum cmd_flags_table
  MAINTAINERS: Add entry for iSER target driver
  target: Allow Write Exclusive non-reservation holders to READ
  target: Drop left-over fabric_max_sectors attribute
  target: Drop arbitrary maximum I/O size limit
  Documentation/target: Update fabric_ops to latest code
  vhost-scsi: Add missing virtio-scsi -> TCM attribute conversion

mm: mmu_gather: use tlb->end != 0 only for TLB invalidation

When batching up address ranges for TLB invalidation, we check tlb->end
!= 0 to indicate that some pages have actually been unmapped.

As of commit f045bbb9fa1b ("mmu_gather: fix over-eager
tlb_flush_mmu_free() calling"), we use the same check for freeing these
pages in order to avoid a performance regression where we call
free_pages_and_swap_cache even when no pages are actually queued up.

Unfortunately, the range could have been reset (tlb->end = 0) by
tlb_end_vma, which has been shown to cause memory leaks on arm64.
Furthermore, investigation into these leaks revealed that the fullmm
case on task exit no longer invalidates the TLB, by virtue of tlb->end
== 0 (in 3.18, need_flush would have been set).

This patch resolves the problem by reverting commit f045bbb9fa1b, using
instead tlb->local.nr as the predicate for page freeing in
tlb_flush_mmu_free and ensuring that tlb->end is initialised to a
non-zero value in the fullmm case.

Tested-by: Mark Langsdorf <mlangsdo@redhat.com>
Tested-by: Dave Hansen <dave@sr71.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()

Commit e4c7f259c5be ("USB: kaweth.c: use GFP_ATOMIC under spin_lock")
makes sure that kaweth_internal_control_msg() allocates memory with GFP_ATOMIC,
but kaweth_internal_control_msg() also calls usb_start_wait_urb()
that still allocates memory with GFP_NOIO.

The patch fixes usb_start_wait_urb() as well.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: add me as ibmveth maintainer

Adding myself as the ibmveth maintainer and replacing
Santiago Leon.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Cc: Santiago Leon <santi_leon@yahoo.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: fix bug in broadcast retransmit code

In commit 58dc55f25631178ee74cd27185956a8f7dcb3e32 ("tipc: use generic
SKB list APIs to manage link transmission queue") we replace all list
traversal loops with the macros skb_queue_walk() or
skb_queue_walk_safe(). While the previous loops were based on the
assumption that the list was NULL-terminated, the standard macros
stop when the iterator reaches the list head, which is non-NULL.

In the function bclink_retransmit_pkt() this macro replacement has
lead to a bug. When we receive a BCAST STATE_MSG we unconditionally
call the function bclink_retransmit_pkt(), whether there really is
anything to retransmit or not, assuming that the sequence number
comparisons will lead to the correct behavior. However, if the
transmission queue is empty, or if there are no eligible buffers in
the transmission queue, we will by mistake pass the list head pointer
to the function tipc_link_retransmit(). Since the list head is not a
valid sk_buff, this leads to a crash.

In this commit we fix this by only calling tipc_link_retransmit()
if we actually found eligible buffers in the transmission queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

update ip-sysctl.txt documentation (v2)

Update documentation to reflect the fact that
/proc/sys/net/ipv4/route/max_size is no longer used for ipv4.

Signed-off-by: Ani Sinha <ani@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/at91_ether: prepare and unprepare clock

The clock is enabled without being prepared, this leads to:

WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:889 __clk_enable+0x24/0xa8()

and a non working ethernet interface.

Use clk_prepare_enable() and clk_disable_unprepare() to handle the clock.

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn: fix NUL (\0 or \x00) specification in string

In C one can either use '\0' or '\x00' (or '\000') to add a NUL byte to
a string. '\0x00' isn't part of these and will in fact result in a
single NUL followed by "x00". This fixes that.

Signed-off-by: Giel van Schijndel <me@mortis.eu>
Reported-at: http://www.viva64.com/en/b/0299/
Signed-off-by: David S. Miller <davem@davemloft.net>

arm64: KVM: Fix HCR setting for 32bit guests

Commit b856a59141b1 (arm/arm64: KVM: Reset the HCR on each vcpu
when resetting the vcpu) moved the init of the HCR register to
happen later in the init of a vcpu, but left out the fixup
done in kvm_reset_vcpu when preparing for a 32bit guest.

As a result, the 32bit guest is run as a 64bit guest, but the
rest of the kernel still manages it as a 32bit. Fun follows.

Moving the fixup to vcpu_reset_hcr solves the problem for good.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

arm64: KVM: Fix TLB invalidation by IPA/VMID

It took about two years for someone to notice that the IPA passed
to TLBI IPAS2E1IS must be shifted by 12 bits. Clearly our reviewing
is not as good as it should be...

Paper bag time for me.

Reported-by: Mario Smarduch <m.smarduch@samsung.com>
Tested-by: Mario Smarduch <m.smarduch@samsung.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

xen: check for zero sized area when invalidating memory

With the introduction of the linear mapped p2m list setting memory
areas to "invalid" had to be delayed. When doing the invalidation
make sure no zero sized areas are processed.

Signed-off-by: Juegren Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>

xen: use correct type for physical addresses

When converting a pfn to a physical address be sure to use 64 bit
wide types or convert the physical address to a pfn if possible.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>

xen: correct race in alloc_p2m_pmd()

When allocating a new pmd for the linear mapped p2m list a check is
done for not introducing another pmd when this just happened on
another cpu. In this case the old pte pointer was returned which
points to the p2m_missing or p2m_identity page. The correct value
would be the pointer to the found new page.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>

xen: correct error for building p2m list on 32 bits

In xen_rebuild_p2m_list() for large areas of invalid or identity
mapped memory the pmd entries on 32 bit systems are initialized
wrong. Correct this error.

Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>

s390/uprobes: fix user space PER events

If uprobes are single stepped for example with gdb, the behavior should
now be correct. Before this patch, when gdb was single stepping a uprobe,
the result was a SIGILL.
When PER is active for any storage alteration and a uprobe is hit, a storage
alteration event is indicated. These over indications are filterd out by gdb,
if no change has happened within the observed area.

Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

mmc: sdhci-pci: Add support for Intel SPT

Add PCI IDs for SPT eMMC, SDIO and SD card.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-acpi: Add ACPI HID INT344D

Add ACPI HID INT344D for an Intel SDIO host controller.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: Fix sleep in atomic after inserting SD card

Sleep in atomic context happened on Trats2 board after inserting or
removing SD card because mmc_gpio_get_cd() was called under spin lock.

Fix this by moving card detection earlier, before acquiring spin lock.
The mmc_gpio_get_cd() call does not have to be protected by spin lock
because it does not access any sdhci internal data.
The sdhci_do_get_cd() call access host flags (SDHCI_DEVICE_DEAD). After
moving it out side of spin lock it could theoretically race with driver
removal but still there is no actual protection against manual card
eject.

Dmesg after inserting SD card:
[   41.663414] BUG: sleeping function called from invalid context at drivers/gpio/gpiolib.c:1511
[   41.670469] in_atomic(): 1, irqs_disabled(): 128, pid: 30, name: kworker/u8:1
[   41.677580] INFO: lockdep is turned off.
[   41.681486] irq event stamp: 61972
[   41.684872] hardirqs last  enabled at (61971): [<c0490ee0>] _raw_spin_unlock_irq+0x24/0x5c
[   41.693118] hardirqs last disabled at (61972): [<c04907ac>] _raw_spin_lock_irq+0x18/0x54
[   41.701190] softirqs last  enabled at (61648): [<c0026fd4>] __do_softirq+0x234/0x2c8
[   41.708914] softirqs last disabled at (61631): [<c00273a0>] irq_exit+0xd0/0x114
[   41.716206] Preemption disabled at:[<  (null)>]   (null)
[   41.721500]
[   41.722985] CPU: 3 PID: 30 Comm: kworker/u8:1 Tainted: G        W      3.18.0-rc5-next-20141121 #883
[   41.732111] Workqueue: kmmcd mmc_rescan
[   41.735945] [<c0014d2c>] (unwind_backtrace) from [<c0011c80>] (show_stack+0x10/0x14)
[   41.743661] [<c0011c80>] (show_stack) from [<c0489d14>] (dump_stack+0x70/0xbc)
[   41.750867] [<c0489d14>] (dump_stack) from [<c0228b74>] (gpiod_get_raw_value_cansleep+0x18/0x30)
[   41.759628] [<c0228b74>] (gpiod_get_raw_value_cansleep) from [<c03646e8>] (mmc_gpio_get_cd+0x38/0x58)
[   41.768821] [<c03646e8>] (mmc_gpio_get_cd) from [<c036d378>] (sdhci_request+0x50/0x1a4)
[   41.776808] [<c036d378>] (sdhci_request) from [<c0357934>] (mmc_start_request+0x138/0x268)
[   41.785051] [<c0357934>] (mmc_start_request) from [<c0357cc8>] (mmc_wait_for_req+0x58/0x1a0)
[   41.793469] [<c0357cc8>] (mmc_wait_for_req) from [<c0357e68>] (mmc_wait_for_cmd+0x58/0x78)
[   41.801714] [<c0357e68>] (mmc_wait_for_cmd) from [<c0361c00>] (mmc_io_rw_direct_host+0x98/0x124)
[   41.810480] [<c0361c00>] (mmc_io_rw_direct_host) from [<c03620f8>] (sdio_reset+0x2c/0x64)
[   41.818641] [<c03620f8>] (sdio_reset) from [<c035a3d8>] (mmc_rescan+0x254/0x2e4)
[   41.826028] [<c035a3d8>] (mmc_rescan) from [<c003a0e0>] (process_one_work+0x180/0x3f4)
[   41.833920] [<c003a0e0>] (process_one_work) from [<c003a3bc>] (worker_thread+0x34/0x4b0)
[   41.841991] [<c003a3bc>] (worker_thread) from [<c003fed8>] (kthread+0xe4/0x104)
[   41.849285] [<c003fed8>] (kthread) from [<c000f268>] (ret_from_fork+0x14/0x2c)
[   42.038276] mmc0: new high speed SDHC card at address 1234

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 94144a465dd0 ("mmc: sdhci: add get_cd() implementation")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-pxav3: do the mbus window configuration after enabling clocks

In commit 5491ce3f79ee ("mmc: sdhci-pxav3: add support for the Armada
38x SDHCI controller"), the sdhci-pxav3 driver was extended to include
support for the SDHCI controller found in the Armada 38x
processor. This mainly involved adding some MBus window related
configuration.

However, this configuration is currently done too early in ->probe():
it is done before clocks are enabled, while this configuration
involves touching the registers of the controller, which will hang the
SoC if the clock is disabled. It wasn't noticed until now because the
bootloader typically leaves gatable clocks enabled, but in situations
where we have a deferred probe (due to a CD GPIO that cannot be taken,
for example), then the probe will be re-tried later, after a clock
disable has been done in the exit path of the failed probe attempt of
the device. This second probe() will hang the system due to the clock
being disabled.

This can for example be produced on Armada 385 GP, which has a CD GPIO
connected to an I2C PCA9555. If the driver for the PCA9555 is not
compiled into the kernel, then we will have the following sequence of
events:

  1. The SDHCI probes
  2. It does the MBus configuration (which works, because the clock is
     left enabled by the bootloader)
  3. It enables the clock
  4. It tries to get the CD GPIO, which fails due to the driver being
     missing, so -EPROBE_DEFER is returned.
  5. Before returning -EPROBE_DEFER, the driver cleans up what was
     done, which includes disabling the clock.
  6. Later on, the SDHCI probe is tried again.
  7. It does the MBus configuration, which hangs because the clock is
     no longer enabled.

This commit does the obvious fix of doing the MBus configuration after
the clock has been enabled by the driver.

Fixes: 5491ce3f79ee ("mmc: sdhci-pxav3: add support for the Armada 38x SDHCI controller")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: Disable re-tuning for HS400

Re-tuning for HS400 mode must be done in HS200
mode. Currently there is no support for that.
That needs to be reflected in the code.
Specifically, if tuning is executed in HS400 mode
then return an error, and do not start the
tuning timer if HS200 tuning is being done prior
to switching to HS400.

Note that periodic re-tuning is not expected
to be needed for HS400 but re-tuning is still
needed after the host controller has lost power.
In the case of suspend/resume that is not necessary
because the card is fully re-initialised. That
just leaves runtime suspend/resume with no support
for HS400 re-tuning.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: Simplify use of tuning timer

The tuning timer is always used if the tuning mode
is 1 and there is a tuning count, irrespective of
whether this is the first call, or any subsequent
call. Consequently the logic to start the timer
can be simplified.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: Add out_unlock to sdhci_execute_tuning

A 'goto' can be used to save duplicating unlocking
and returning.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: Tuning should not change max_blk_count

Re-tuning requires that the maximum data length
is limited to 4MiB. The code currently changes
max_blk_count in an attempt to achieve that.
This is wrong because max_blk_count is a different
limit, but it is also un-necessary because
max_req_size is 512KiB anyway. Consequently, the
changes to max_blk_count are removed and the
comment for max_req_size adjusted accordingly.
The comment is also tweaked to show that the 512KiB
limit is a SDMA limit not an ADMA limit.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

Merge tag 'wireless-drivers-for-davem-2015-01-09' of git://git./linux/kernel/git/kvalo/wireless-drivers

* rtlwifi: fix a regression in large skb allocation failure

iwlwifi:

* fix for 7265D NVM check
* fixes for scan: fix long scanning times and network discovery
* new firmware API for iwlmvm supported devices
* fixes in rate control

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
netfilter/ipvs fixes for net

The following patchset contains netfilter/ipvs fixes, they are:

1) Small fix for the FTP helper in IPVS, a diff variable may be left
   unset when CONFIG_IP_VS_IPV6 is set. Patch from Dan Carpenter.

2) Fix nf_tables port NAT in little endian archs, patch from leroy
   christophe.

3) Fix race condition between conntrack confirmation and flush from
   userspace. This is the second reincarnation to resolve this problem.

4) Make sure inner messages in the batch come with the nfnetlink header.

5) Relax strict check from nfnetlink_bind() that may break old userspace
   applications using all 1s group mask.

6) Schedule removal of chains once no sets and rules refer to them in
   the new nf_tables ruleset flush command. Reported by Asbjoern Sloth
   Toennesen.

Note that this batch comes later than usual because of the short
winter holidays.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

packet: bail out of packet_snd() if L2 header creation fails

Due to a misplaced parenthesis, the expression

(unlikely(offset) < 0),

which expands to

(__builtin_expect(!!(offset), 0) < 0),

never evaluates to true. Therefore, when sending packets with
PF_PACKET/SOCK_DGRAM, packet_snd() does not abort as intended
if the creation of the layer 2 header fails.

Spotted by Coverity - CID 1259975 ("Operands don't affect result").

Fixes: 9c7077622dd9 ("packet: make packet_snd fail on len smaller than l2 header")
Signed-off-by: Christoph Jaeger <cj@linux.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

alx: fix alx_poll()

Commit d75b1ade567f ("net: less interrupt masking in NAPI") uncovered
wrong alx_poll() behavior.

A NAPI poll() handler is supposed to return exactly the budget when/if
napi_complete() has not been called.

It is also supposed to return number of frames that were received, so
that netdev_budget can have a meaning.

Also, in case of TX pressure, we still have to dequeue received
packets : alx_clean_rx_irq() has to be called even if
alx_clean_tx_irq(alx) returns false, otherwise device is half duplex.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: d75b1ade567f ("net: less interrupt masking in NAPI")
Reported-by: Oded Gabbay <oded.gabbay@amd.com>
Bisected-by: Oded Gabbay <oded.gabbay@amd.com>
Tested-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dnet: fix dnet_poll()

A NAPI poll() handler is supposed to return exactly the budget when/if
napi_complete() has not been called.

It is also supposed to return number of frames that were received, so
that netdev_budget can have a meaning.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

linux 3.19-rc4

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"Three small fixes from over the Christmas period, and wiring up the
  new execveat syscall for ARM"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error
  ARM: 8253/1: mm: use phys_addr_t type in map_lowmem() for kernel mem region
  ARM: 8249/1: mm: dump: don't skip regions
  ARM: wire up execveat syscall

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Misc fixes: two vdso fixes, two kbuild fixes and a boot failure fix
  with certain odd memory mappings"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, vdso: Use asm volatile in __getcpu
  x86/build: Clean auto-generated processor feature files
  x86: Fix mkcapflags.sh bash-ism
  x86: Fix step size adjustment during initial memory mapping
  x86_64, vdso: Fix the vdso address randomization algorithm

Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"Misc fixes: group scheduling corner case fix, two deadline scheduler
  fixes, effective_load() overflow fix, nested sleep fix, 6144 CPUs
  system fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group()
  sched/deadline: Avoid double-accounting in case of missed deadlines
  sched/deadline: Fix migration of SCHED_DEADLINE tasks
  sched: Fix odd values in effective_load() calculations
  sched, fanotify: Deal with nested sleeps
  sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also some kernel side fixes: uncore PMU
  driver fix, user regs sampling fix and an instruction decoder fix that
  unbreaks PEBS precise sampling"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes
  perf/x86_64: Improve user regs sampling
  perf: Move task_pt_regs sampling into arch code
  x86: Fix off-by-one in instruction decoder
  perf hists browser: Fix segfault when showing callchain
  perf callchain: Free callchains when hist entries are deleted
  perf hists: Fix children sort key behavior
  perf diff: Fix to sort by baseline field by default
  perf list: Fix --raw-dump option
  perf probe: Fix crash in dwarf_getcfi_elf
  perf probe: Fix to fall back to find probe point in symbols
  perf callchain: Append callchains only when requested
  perf ui/tui: Print backtrace symbols when segfault occurs
  perf report: Show progress bar for output resorting

Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:
"A liblockdep fix and a mutex_unlock() mutex-debugging fix"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mutex: Always clear owner field upon mutex_unlock()
tools/liblockdep: Fix debug_check thinko in mutex destroy

mm: fix corner case in anon_vma endless growing prevention

Fix for BUG_ON(anon_vma->degree) splashes in unlink_anon_vmas() ("kernel
BUG at mm/rmap.c:399!") caused by commit 7a3ef208e662 ("mm: prevent
endless growth of anon_vma hierarchy")

Anon_vma_clone() is usually called for a copy of source vma in
destination argument.  If source vma has anon_vma it should be already
in dst->anon_vma.  NULL in dst->anon_vma is used as a sign that it's
called from anon_vma_fork().  In this case anon_vma_clone() finds
anon_vma for reusing.

Vma_adjust() calls it differently and this breaks anon_vma reusing
logic: anon_vma_clone() links vma to old anon_vma and updates degree
counters but vma_adjust() overrides vma->anon_vma right after that.  As
a result final unlink_anon_vmas() decrements degree for wrong anon_vma.

This patch assigns ->anon_vma before calling anon_vma_clone().

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-and-tested-by: Chris Clayton <chris2553@googlemail.com>
Reported-and-tested-by: Oded Gabbay <oded.gabbay@amd.com>
Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Forrest <dan.forrest@ssec.wisc.edu>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org # to match back-porting of 7a3ef208e662
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: Don't count the stack guard page towards RLIMIT_STACK

Commit fee7e49d4514 ("mm: propagate error from stack expansion even for
guard page") made sure that we return the error properly for stack
growth conditions. It also theorized that counting the guard page
towards the stack limit might break something, but also said "Let's see
if anybody notices".

Somebody did notice. Apparently android-x86 sets the stack limit very
close to the limit indeed, and including the guard page in the rlimit
check causes the android 'zygote' process problems.

So this adds the (fairly trivial) code to make the stack rlimit check be
against the actual real stack size, rather than the size of the vma that
includes the guard page.

Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org>
Cc: Jay Foad <jay.foad@gmail.com>
Cc: stable@kernel.org # to match back-porting of fee7e49d4514
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'core/urgent' into locking/urgent, to collect all pending locking fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'vfio-v3.19-rc4' of git://github.com/awilliam/linux-vfio

Pull VFIO fix from Alex Williamson:
"Fix PCI header check in vfio_pci_probe() (Wei Yang)"

* tag 'vfio-v3.19-rc4' of git://github.com/awilliam/linux-vfio:
vfio-pci: Fix the check on pci device type in vfio_pci_probe()

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
"Just one fix: a qlogic busy wait regression"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
qla2xxx: fix busy wait regression

Merge tag 'sound-3.19-rc4' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"All a few small regression or stable fixes: a Nvidia HDMI ID addition,
  a regression fix for CAIAQ stream count, a typo fix for GPIO setup
  with STAC/IDT HD-audio codecs, and a Fireworks big-endian fix"

* tag 'sound-3.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: fireworks: fix an endianness bug for transaction length
  ALSA: hda - Add new GPU codec ID 0x10de0072 to snd-hda
  ALSA: hda - Fix wrong gpio_dir & gpio_mask hint setups for IDT/STAC codecs
  ALSA: snd-usb-caiaq: fix stream count check

Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid

Pull HID updates from Jiri Kosina:

- bounds checking fixes in logitech and roccat drivers, from Peter Wu
   and Dan Carpenter

- double-kfree fix in i2c-hid driver on bus shutdown, from Mika
   Westerberg

- a couple of various small driver fixes

- a few device id additions

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: roccat: potential out of bounds in pyra_sysfs_write_settings()
  HID: Add a new id 0x501a for Genius MousePen i608X
  HID: logitech-hidpp: prefix the name with "Logitech"
  HID: logitech-hidpp: avoid unintended fall-through
  HID: Allow HID_BATTERY_STRENGTH to be enabled
  HID: i2c-hid: Do not free buffers in i2c_hid_stop()
  HID: add battery quirk for USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_ISO keyboard
  HID: logitech-hidpp: check WTP report length
  HID: logitech-dj: check report length

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"I'm briefly working between holidays and LCA, so this is close to a
  couple of weeks of fixes,

  Two sets of amdkfd fixes, this is a new feature this kernel, and this
  pull fixes a few issues since it got merged, ordering when built-in to
  kernel and also the iommu vs gpu ordering patch, it also reworks the
  ioctl before the initial release.

  Otherwise:
   - radeon: some misc fixes all over, hdmi, 4k, dpm
   - nouveau: mcp77 init fixes, oops fix, bug on fix, msi fix
   - i915: power fixes, revert VGACNTR patch

  Probably be quiteer next week since I'll be at LCA anyways"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (33 commits)
  drm/amdkfd: rewrite kfd_ioctl() according to drm_ioctl()
  drm/amdkfd: reformat IOCTL definitions to drm-style
  drm/amdkfd: Do copy_to/from_user in general kfd_ioctl()
  drm/radeon: integer underflow in radeon_cp_dispatch_texture()
  drm/radeon: adjust default bapm settings for KV
  drm/radeon: properly filter DP1.2 4k modes on non-DP1.2 hw
  drm/radeon: fix sad_count check for dce3
  drm/radeon: KV has three PPLLs (v2)
  drm/amdkfd: unmap VMID<-->PASID when relesing VMID (non-HWS)
  drm/radeon: Init amdkfd only if it was compiled
  amdkfd: actually allocate longs for the pasid bitmask
  drm/nouveau/nouveau: Do not BUG_ON(!spin_is_locked()) on UP
  drm/nv4c/mc: disable msi
  drm/nouveau/fb/ram/mcp77: enable NISO poller
  drm/nouveau/fb/ram/mcp77: use carveout reg to determine size
  drm/nouveau/fb/ram/mcp77: subclass nouveau_ram
  drm/nouveau: wake up the card if necessary during gem callbacks
  drm/nouveau/device: Add support for GK208B, resolves bug 86935
  drm/nouveau: fix missing return statement in nouveau_ttm_tt_unpopulate
  drm/nouveau/bios: fix oops on pre-nv50 chipsets
  ...

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"Here is a handful of minor arm64 fixes discovered and fixed over the
  Christmas break.  The main part is adding some missing #includes that
  we seem to be getting transitively but have started causing problems
  in -next.

   - Fix early mapping fixmap corruption by EFI runtime services
   - Fix __NR_compat_syscalls off-by-one
   - Add missing sanity checks for some 32-bit registers
   - Add some missing #includes which we get transitively
   - Remove unused prepare_to_copy() macro"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/efi: add missing call to early_ioremap_reset()
  arm64: fix missing asm/io.h include in kernel/smp_spin_table.c
  arm64: fix missing asm/alternative.h include in kernel/module.c
  arm64: fix missing linux/bug.h include in asm/arch_timer.h
  arm64: fix missing asm/pgtable-hwdef.h include in asm/processor.h
  arm64: sanity checks: add missing AArch32 registers
  arm64: Remove unused prepare_to_copy()
  arm64: Correct __NR_compat_syscalls for bpf

Merge tag 'for_linus-3.19-rc4' of git://git./linux/kernel/git/jwessel/kgdb

Pull kgdb/kdb fixes from Jason Wessel:
"These have been around since 3.17 and in kgdb-next for the last 9
  weeks and some will go back to -stable.

  Summary of changes:

  Cleanups
   - kdb: Remove unused command flags, repeat flags and KDB_REPEAT_NONE

  Fixes
   - kgdb/kdb: Allow access on a single core, if a CPU round up is
     deemed impossible, which will allow inspection of the now "trashed"
     kernel
   - kdb: Add enable mask for the command groups
   - kdb: access controls to restrict sensitive commands"

* tag 'for_linus-3.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
  kernel/debug/debug_core.c: Logging clean-up
  kgdb: timeout if secondary CPUs ignore the roundup
  kdb: Allow access to sensitive commands to be restricted by default
  kdb: Add enable mask for groups of commands
  kdb: Categorize kdb commands (similar to SysRq categorization)
  kdb: Remove KDB_REPEAT_NONE flag
  kdb: Use KDB_REPEAT_* values as flags
  kdb: Rename kdb_register_repeat() to kdb_register_flags()
  kdb: Rename kdb_repeat_t to kdb_cmdflags_t, cmd_repeat to cmd_flags
  kdb: Remove currently unused kdbtab_t->cmd_flags

Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux

Pull two nfsd bugfixes from Bruce Fields.

* 'for-3.19' of git://linux-nfs.org/~bfields/linux:
rpc: fix xdr_truncate_encode to handle buffer ending on page boundary
nfsd: fix fi_delegees leak when fi_had_conflict returns true

Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

Pull two Ceph fixes from Sage Weil:
"These are both pretty trivial: a sparse warning fix and size_t printk
  thing"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: fix sparse endianness warnings
  ceph: use %zu for len in ceph_fill_inline_data()

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"None of these are huge, but my commit does fix a regression from 3.18
  that could cause lost files during log replay.

  This also adds Dave Sterba to the list of Btrfs maintainers.  It
  doesn't mean we're doing things differently, but Dave has really been
  helping with the maintainer workload for years"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: don't delay inode ref updates during log replay
  Btrfs: correctly get tree level in tree_backref_for_extent
  Btrfs: call inode_dec_link_count() on mkdir error path
  Btrfs: abort transaction if we don't find the block group
  Btrfs, scrub: uninitialized variable in scrub_extent_for_parity()
  Btrfs: add more maintainers

iscsi-target: Fix typos in enum cmd_flags_table

Everything else starts with ICF so the last two should as well.

Fix places they are used to match.

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

MAINTAINERS: Add entry for iSER target driver

iSCSI extensions for RDMA - Target mode.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Allow Write Exclusive non-reservation holders to READ

For PGR reservation of type Write Exclusive Access, allow all non
reservation holding I_T nexuses with active registrations to READ
from the device.

This addresses a bug where active registrations that attempted
to READ would result in an reservation conflict.

Signed-off-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Drop left-over fabric_max_sectors attribute

Now that fabric_max_sectors is no longer used to enforce the maximum
I/O size, go ahead and drop it's left-over usage in target-core and
associated backend drivers.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Drop arbitrary maximum I/O size limit

This patch drops the arbitrary maximum I/O size limit in sbc_parse_cdb(),
which currently for fabric_max_sectors is hardcoded to 8192 (4 MB for 512
byte sector devices), and for hw_max_sectors is a backend driver dependent
value.

This limit is problematic because Linux initiators have only recently
started to honor block limits MAXIMUM TRANSFER LENGTH, and other non-Linux
based initiators (eg: MSFT Fibre Channel) can also generate I/Os larger
than 4 MB in size.

Currently when this happens, the following message will appear on the
target resulting in I/Os being returned with non recoverable status:

SCSI OP 28h with too big sectors 16384 exceeds fabric_max_sectors: 8192

Instead, drop both [fabric,hw]_max_sector checks in sbc_parse_cdb(),
and convert the existing hw_max_sectors into a purely informational
attribute used to represent the granuality that backend driver and/or
subsystem code is splitting I/Os upon.

Also, update FILEIO with an explicit FD_MAX_BYTES check in fd_execute_rw()
to deal with the one special iovec limitiation case.

v2 changes:
- Drop hw_max_sectors check in sbc_parse_cdb()

Reported-by: Lance Gropper <lance.gropper@qosserver.com>
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: stable@vger.kernel.org # 3.4
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"12 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
  memcg: fix destination cgroup leak on task charges migration
  mm: memcontrol: switch soft limit default back to infinity
  mm/debug_pagealloc: remove obsolete Kconfig options
  vfs: renumber FMODE_NONOTIFY and add to uniqueness check
  arch/blackfin/mach-bf533/boards/stamp.c: add linux/delay.h
  ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file
  MAINTAINERS: update rydberg's addresses
  mm: protect set_page_dirty() from ongoing truncation
  mm: prevent endless growth of anon_vma hierarchy
  exit: fix race between wait_consider_task() and wait_task_zombie()
  ocfs2: remove bogus check in dlm_process_recovery_data

ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error

In v3.19-rc3 tree when CONFIG_ARM_LPAE and CONFIG_DEBUG_RODATA are enabled
image failed to compile with the following error:

arch/arm/mm/init.c:661:14: error: ‘PMD_SECT_RDONLY’ undeclared here (not in a function)

It seems that '80d6b0c ARM: mm: allow text and rodata sections to be read-only'
and 'ded9477 ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE'
commits crossed. 80d6b0c uses PMD_SECT_RDONLY macro but ded9477 renames it
and uses software bits L_PMD_SECT_RDONLY instead.

Fix is to use L_PMD_SECT_RDONLY instead PMD_SECT_RDONLY as ded9477 does in
another places.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

HID: roccat: potential out of bounds in pyra_sysfs_write_settings()

This is a static checker fix. We write some binary settings to the
sysfs file. One of the settings is the "->startup_profile". There
isn't any checking to make sure it fits into the
pyra->profile_settings[] array in the profile_activated() function.

I added a check to pyra_sysfs_write_settings() in both places because
I wasn't positive that the other callers were correct.

Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

mutex: Always clear owner field upon mutex_unlock()

Currently if DEBUG_MUTEXES is enabled, the mutex->owner field is only
cleared iff debug_locks is active. This exposes a race to other users of
the field where the mutex->owner may be still set to a stale value,
potentially upsetting mutex_spin_on_owner() among others.

References: https://bugs.freedesktop.org/show_bug.cgi?id=87955
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1420540175-30204-1-git-send-email-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group()

When alloc_fair_sched_group() in sched_create_group() fails,
free_sched_group() is called, and free_fair_sched_group() is called by
free_sched_group(). Since destroy_cfs_bandwidth() is called by
free_fair_sched_group() without calling init_cfs_bandwidth(),
RCU stall occurs at hrtimer_cancel():

  INFO: rcu_sched self-detected stall on CPU { 1}  (t=60000 jiffies g=13074 c=13073 q=0)
  Task dump for CPU 1:
  (fprintd)       R  running task        0  6249      1 0x00000088
  ...
  Call Trace:
   <IRQ>  [<ffffffff81094988>] sched_show_task+0xa8/0x110
   [<ffffffff81097acd>] dump_cpu_task+0x3d/0x50
   [<ffffffff810c3a80>] rcu_dump_cpu_stacks+0x90/0xd0
   [<ffffffff810c7751>] rcu_check_callbacks+0x491/0x700
   [<ffffffff810cbf2b>] update_process_times+0x4b/0x80
   [<ffffffff810db046>] tick_sched_handle.isra.20+0x36/0x50
   [<ffffffff810db0a2>] tick_sched_timer+0x42/0x70
   [<ffffffff810ccb19>] __run_hrtimer+0x69/0x1a0
   [<ffffffff810db060>] ? tick_sched_handle.isra.20+0x50/0x50
   [<ffffffff810ccedf>] hrtimer_interrupt+0xef/0x230
   [<ffffffff810452cb>] local_apic_timer_interrupt+0x3b/0x70
   [<ffffffff8164a465>] smp_apic_timer_interrupt+0x45/0x60
   [<ffffffff816485bd>] apic_timer_interrupt+0x6d/0x80
   <EOI>  [<ffffffff810cc588>] ? lock_hrtimer_base.isra.23+0x18/0x50
   [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230
   [<ffffffff810cc9d2>] hrtimer_try_to_cancel+0x22/0xd0
   [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230
   [<ffffffff810ccaa2>] hrtimer_cancel+0x22/0x30
   [<ffffffff810a3cb5>] free_fair_sched_group+0x25/0xd0
   [<ffffffff8108df46>] free_sched_group+0x16/0x40
   [<ffffffff810971bb>] sched_create_group+0x4b/0x80
   [<ffffffff810aa383>] sched_autogroup_create_attach+0x43/0x1c0
   [<ffffffff8107dc9c>] sys_setsid+0x7c/0x110
   [<ffffffff81647729>] system_call_fastpath+0x12/0x17

Check whether init_cfs_bandwidth() was called before calling
destroy_cfs_bandwidth().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[ Move the check into destroy_cfs_bandwidth() to aid compilability. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/201412252210.GCC30204.SOMVFFOtQJFLOH@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/deadline: Avoid double-accounting in case of missed deadlines

The dl_runtime_exceeded() function is supposed to ckeck if
a SCHED_DEADLINE task must be throttled, by checking if its
current runtime is <= 0. However, it also checks if the
scheduling deadline has been missed (the current time is
larger than the current scheduling deadline), further
decreasing the runtime if this happens.
This "double accounting" is wrong:

- In case of partitioned scheduling (or single CPU), this
  happens if task_tick_dl() has been called later than expected
  (due to small HZ values). In this case, the current runtime is
  also negative, and replenish_dl_entity() can take care of the
  deadline miss by recharging the current runtime to a value smaller
  than dl_runtime

- In case of global scheduling on multiple CPUs, scheduling
  deadlines can be missed even if the task did not consume more
  runtime than expected, hence penalizing the task is wrong

This patch fix this problem by throttling a SCHED_DEADLINE task
only when its runtime becomes negative, and not modifying the runtime

Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418813432-20797-3-git-send-email-luca.abeni@unitn.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/deadline: Fix migration of SCHED_DEADLINE tasks

According to global EDF, tasks should be migrated between runqueues
without checking if their scheduling deadlines and runtimes are valid.
However, SCHED_DEADLINE currently performs such a check:
a migration happens doing:

deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, later_rq->cpu);
activate_task(later_rq, next_task, 0);

which ends up calling dequeue_task_dl(), setting the new CPU, and then
calling enqueue_task_dl().

enqueue_task_dl() then calls enqueue_dl_entity(), which calls
update_dl_entity(), which can modify scheduling deadline and runtime,
breaking global EDF scheduling.

As a result, some of the properties of global EDF are not respected:
for example, a taskset {(30, 80), (40, 80), (120, 170)} scheduled on
two cores can have unbounded response times for the third task even
if 30/80+40/80+120/170 = 1.5809 < 2

This can be fixed by invoking update_dl_entity() only in case of
wakeup, or if this is a new SCHED_DEADLINE task.

Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418813432-20797-2-git-send-email-luca.abeni@unitn.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Fix odd values in effective_load() calculations

In effective_load, we have (long w * unsigned long tg->shares) / long W,
when w is negative, it is cast to unsigned long and hence the product is
insanely large. Fix this by casting tg->shares to long.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141219002956.GA25405@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, fanotify: Deal with nested sleeps

As per e23738a7300a ("sched, inotify: Deal with nested sleeps").

fanotify_read is a wait loop with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).

Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.

Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/20141216152838.GZ3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>