openwrt/staging/blogic.git
6 years agoprintk: Add console owner and waiter logic to load balance console writes
Steven Rostedt (VMware) [Wed, 10 Jan 2018 13:24:17 +0000 (14:24 +0100)]
printk: Add console owner and waiter logic to load balance console writes

This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.

Here's the design again:

I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.

There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.

In printk() when it tries to write to the consoles, we have:

if (console_trylock())
console_unlock();

Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.

When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.

If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.

Then the waiter calls console_unlock() and continues to write to the
consoles.

If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!

By Petr Mladek about possible new deadlocks:

The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."

We could look at it from this side. The possible deadlock would
look like:

CPU0                            CPU1

console_unlock()

  console_owner = current;

spin_lockA()
  printk()
    spin = true;
    while (...)

    call_console_drivers()
      spin_lockA()

This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.

But if the above is true than the following scenario was
already possible before:

CPU0

spin_lockA()
  printk()
    console_unlock()
      call_console_drivers()
spin_lockA()

By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.

By Steven Rostedt:

To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.

 #include <linux/module.h>
 #include <linux/delay.h>
 #include <linux/sched.h>
 #include <linux/mutex.h>
 #include <linux/workqueue.h>
 #include <linux/hrtimer.h>

 static bool stop_testing;
 static unsigned int loops = 1;

 static void preempt_printk_workfn(struct work_struct *work)
 {
  int i;

  while (!READ_ONCE(stop_testing)) {
  for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
  preempt_disable();
  pr_emerg("%5d%-75s\n", smp_processor_id(),
   " XXX NOPREEMPT");
  preempt_enable();
  }
  msleep(1);
  }
 }

 static struct work_struct __percpu *works;

 static void finish(void)
 {
  int cpu;

  WRITE_ONCE(stop_testing, true);
  for_each_online_cpu(cpu)
  flush_work(per_cpu_ptr(works, cpu));
  free_percpu(works);
 }

 static int __init test_init(void)
 {
  int cpu;

  works = alloc_percpu(struct work_struct);
  if (!works)
  return -ENOMEM;

  /*
   * This is just a test module. This will break if you
   * do any CPU hot plugging between loading and
   * unloading the module.
   */

  for_each_online_cpu(cpu) {
  struct work_struct *work = per_cpu_ptr(works, cpu);

  INIT_WORK(work, &preempt_printk_workfn);
  schedule_work_on(cpu, work);
  }

  return 0;
 }

 static void __exit test_exit(void)
 {
  finish();
 }

 module_param(loops, uint, 0);
 module_init(test_init);
 module_exit(test_exit);
 MODULE_LICENSE("GPL");

Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[pmladek@suse.com: Commit message about possible deadlocks]
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek...
Linus Torvalds [Tue, 21 Nov 2017 15:28:13 +0000 (05:28 -1000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/pmladek/printk

Pull printk updates from Petr Mladek:

 - print the warning about dropped messages on consoles on a separate
   line.   It makes it more legible.

 - one typo fix and small code clean up.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  added new line symbol after warning about dropped messages
  printk: fix typo in printk_safe.c
  printk: simplify no_printk()

7 years agoMerge tag 'fbdev-v4.15' of git://github.com/bzolnier/linux
Linus Torvalds [Tue, 21 Nov 2017 07:50:24 +0000 (21:50 -1000)]
Merge tag 'fbdev-v4.15' of git://github.com/bzolnier/linux

Pull fbdev updates from Bartlomiej Zolnierkiewicz:
 "There is nothing really major here (though removal of the dead igafb
  driver stands out in diffstat).

  Summary:

   - convert timers to use timer_setup() (Kees Cook, Thierry Reding)

   - fix panels support on iMX boards in mxsfb driver (Stefan Agner)

   - fix timeout on EDID read in udlfb driver (Ladislav Michl)

   - add missing modes to fix out of bounds access in controlfb driver
     (Geert Uytterhoeven)

   - update initialisation paths in sa1100fb driver to be more robust
     (Russell King)

   - fix error handling path of ->probe method in au1200fb driver
     (Christophe JAILLET)

   - fix handling of cases when either panel or crt is defined in
     sm501fb driver (Sudip Mukherjee, Colin Ian King)

   - add ability to the Goldfish FB driver to be recognized by OS via DT
     (Aleksandar Markovic)

   - structures constifications (Bhumika Goyal)

   - misc fixes (Allen Pais, Gustavo A. R. Silva, Dan Carpenter)

   - misc cleanups (Colin Ian King, Himanshu Jha, Markus Elfring)

   - remove dead igafb driver"

* tag 'fbdev-v4.15' of git://github.com/bzolnier/linux: (42 commits)
  OMAPFB: prevent buffer underflow in omapfb_parse_vram_param()
  video: fbdev: sm501fb: fix potential null pointer dereference on fbi
  fbcon: Initialize ops->info early
  video: fbdev: Convert timers to use timer_setup()
  video: fbdev: pxa3xx_gcu: Convert timers to use timer_setup()
  fbdev: controlfb: Add missing modes to fix out of bounds access
  video: fbdev: sis_main: mark expected switch fall-throughs
  video: fbdev: cirrusfb: mark expected switch fall-throughs
  video: fbdev: aty: radeon_pm: mark expected switch fall-throughs
  video: fbdev: sm501fb: mark expected switch fall-through in sm501fb_blank_crt
  video: fbdev: intelfb: remove redundant variables
  video/fbdev/dnfb: Use common error handling code in dnfb_probe()
  sm501fb: suspend and resume fb if it exists
  sm501fb: unregister framebuffer only if registered
  sm501fb: deallocate colormap only if allocated
  video: goldfishfb: Add support for device tree bindings
  Documentation: Add device tree binding for Goldfish FB driver
  video: udlfb: Fix read EDID timeout
  video: fbdev: remove dead igafb driver
  video: fbdev: mxsfb: fix pixelclock polarity
  ...

7 years agoMerge tag 'devicetree-fixes-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 21 Nov 2017 07:38:41 +0000 (21:38 -1000)]
Merge tag 'devicetree-fixes-for-4.15' of git://git./linux/kernel/git/robh/linux

Pull DeviceTree fixes from Rob Herring:

 - Remove mc13892 as a trivial device

 - Improve of_find_node_by_name() documentation

 - Fix unit test dtc warnings

 - Clean-ups of USB binding documentation

 - Fix potential NULL deref in of_pci_map_rid

* tag 'devicetree-fixes-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: trivial-devices: Remove fsl,mc13892
  of: Document exactly what of_find_node_by_name() puts
  of: unittest: disable interrupts_property warning
  of: unittest: let dtc generate __local_fixups__
  dt-bindings: usb: document hub and host-controller properties
  dt-bindings: usb: clean up compatible property
  dt-bindings: usb: fix reg-property port-number range
  dt-bindings: usb: fix example hub node name
  of/pci: Fix theoretical NULL dereference

7 years agoMerge tag 'jfs-4.15-2' of git://github.com/kleikamp/linux-shaggy
Linus Torvalds [Tue, 21 Nov 2017 07:35:25 +0000 (21:35 -1000)]
Merge tag 'jfs-4.15-2' of git://github.com/kleikamp/linux-shaggy

Pull jfs fixlet from Dave Kleikamp:
 "Update jfs git tree in MAINTAINERS"

* tag 'jfs-4.15-2' of git://github.com/kleikamp/linux-shaggy:
  MAINTAINERS: fix jfs tree location

7 years agodt-bindings: trivial-devices: Remove fsl,mc13892
Jonathan Neuschäfer [Sat, 18 Nov 2017 02:22:32 +0000 (03:22 +0100)]
dt-bindings: trivial-devices: Remove fsl,mc13892

This device's bindings are not trivial: Additional properties are
documented in in Documentation/devicetree/bindings/mfd/mc13xxx.txt.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Rob Herring <robh@kernel.org>
7 years agoof: Document exactly what of_find_node_by_name() puts
Stephen Boyd [Fri, 17 Nov 2017 16:53:21 +0000 (08:53 -0800)]
of: Document exactly what of_find_node_by_name() puts

It isn't clear if this function of_node_put()s the 'from'
argument, or the node it searches. Clearly indicate which
variable is touched. Fold in some more fixes from Randy too
because we're in the area.

Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rob Herring <robh@kernel.org>
7 years agoMAINTAINERS: fix jfs tree location
Tom Saeger [Mon, 20 Nov 2017 17:41:41 +0000 (11:41 -0600)]
MAINTAINERS: fix jfs tree location

JFS tree has been moved to github.

Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
7 years agoMerge tag 'ntb-4.15' of git://github.com/jonmason/ntb
Linus Torvalds [Mon, 20 Nov 2017 06:41:53 +0000 (20:41 -1000)]
Merge tag 'ntb-4.15' of git://github.com/jonmason/ntb

Pull ntb updates from Jon Mason:
 "Support for the switchtec ntb and related changes. Also, a couple of
  bug fixes"

[ The timing isn't great. I had asked people to send me pull requests
  before my family vacation, and this code has not even been in
  linux-next as far as I can tell. But Logan Gunthorpe pleaded for its
  inclusion because the Switchtec driver has apparently been around for
  a while, just never in linux-next - Linus ]

* tag 'ntb-4.15' of git://github.com/jonmason/ntb:
  ntb: intel: remove b2b memory window workaround for Skylake NTB
  NTB: make idt_89hpes_cfg const
  NTB: switchtec_ntb: Update switchtec documentation with notes for NTB
  NTB: switchtec_ntb: Add memory window support
  NTB: switchtec_ntb: Implement scratchpad registers
  NTB: switchtec_ntb: Implement doorbell registers
  NTB: switchtec_ntb: Add link management
  NTB: switchtec_ntb: Add skeleton NTB driver
  NTB: switchtec_ntb: Initialize hardware for doorbells and messages
  NTB: switchtec_ntb: Initialize hardware for memory windows
  NTB: switchtec_ntb: Introduce initial NTB driver
  NTB: Add check and comment for link up to mw_count() and mw_get_align()
  NTB: Ensure ntb_mw_get_align() is only called when the link is up
  NTB: switchtec: Add link event notifier callback
  NTB: switchtec: Add NTB hardware register definitions
  NTB: switchtec: Export class symbol for use in upper layer driver
  NTB: switchtec: Move structure definitions into a common header
  ntb: update maintainer list for Intel NTB driver

7 years agoima: do not update security.ima if appraisal status is not INTEGRITY_PASS
Roberto Sassu [Tue, 7 Nov 2017 10:37:07 +0000 (11:37 +0100)]
ima: do not update security.ima if appraisal status is not INTEGRITY_PASS

Commit b65a9cfc2c38 ("Untangling ima mess, part 2: deal with counters")
moved the call of ima_file_check() from may_open() to do_filp_open() at a
point where the file descriptor is already opened.

This breaks the assumption made by IMA that file descriptors being closed
belong to files whose access was granted by ima_file_check(). The
consequence is that security.ima and security.evm are updated with good
values, regardless of the current appraisal status.

For example, if a file does not have security.ima, IMA will create it after
opening the file for writing, even if access is denied. Access to the file
will be allowed afterwards.

Avoid this issue by checking the appraisal status before updating
security.ima.

Cc: stable@vger.kernel.org
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide
Linus Torvalds [Sun, 19 Nov 2017 18:04:41 +0000 (08:04 -1000)]
Merge git://git./linux/kernel/git/davem/ide

Pull small IDE cleanup from David Miller.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
  PNP: ide: constify pnp_device_id

7 years agontb: intel: remove b2b memory window workaround for Skylake NTB
Dave Jiang [Fri, 10 Nov 2017 23:45:27 +0000 (16:45 -0700)]
ntb: intel: remove b2b memory window workaround for Skylake NTB

The workaround code is never used because Skylake NTB does not need it.

Reported-by: Allen Hubbe <allen.hubbe@dell.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: make idt_89hpes_cfg const
Bhumika Goyal [Fri, 11 Aug 2017 17:47:43 +0000 (23:17 +0530)]
NTB: make idt_89hpes_cfg const

Make these const as they are only used during a copy operation.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec_ntb: Update switchtec documentation with notes for NTB
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:54 +0000 (12:19 -0600)]
NTB: switchtec_ntb: Update switchtec documentation with notes for NTB

The switchtec_ntb driver has a couple requirements on the switchec's
hardware configuration so we add these notes to the documentation.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec_ntb: Add memory window support
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:53 +0000 (12:19 -0600)]
NTB: switchtec_ntb: Add memory window support

The Switchtec hardware has two types of memory windows: LUTs and Direct.
The first area in each BAR is for LUT windows and the remaining area is
for the direct region. The total number of LUT entries is set by a
configuration setting in hardware and they all must be the same
size. (This is fixed by switchtec_ntb to be 64K.)

switchtec_ntb enables the LUTs only for the first BAR and enables the
highest power of two possible. Seeing the LUTs are at the beginning of
the BAR, the direct memory window's alignment is affected. Therefore,
the maximum direct memory window size can not be greater than the number
of LUTs times 64K. The direct window in other BARs will not have this
restriction as the LUTs will not be enabled there. LUTs will only be
exposed through the NTB API if the use_lut_mw parameter is set.

Seeing the Switchtec hardware, by default, configures BARs to be 4G a
module parameter is given to limit the size of the advertised memory
windows. Higher layers tend to allocate the maximum BAR size and this
has a tendency to fail when they try to allocate 4GB of contiguous
memory.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec_ntb: Implement scratchpad registers
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:52 +0000 (12:19 -0600)]
NTB: switchtec_ntb: Implement scratchpad registers

Seeing there is no dedicated hardware for this, we simply add
these as entries in the shared memory window. Thus, we could support
any number of them but 128 seems like enough, for now.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec_ntb: Implement doorbell registers
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:51 +0000 (12:19 -0600)]
NTB: switchtec_ntb: Implement doorbell registers

Pretty straightforward implementation of doorbell registers.
The shift and mask were setup in an earlier patch and this just hooks
up the appropriate portion of the IDB register as the local doorbells
and the opposite portion of ODB as the peer doorbells. The DB mask is
protected by a spinlock to avoid concurrent read-modify-write accesses.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec_ntb: Add link management
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:50 +0000 (12:19 -0600)]
NTB: switchtec_ntb: Add link management

switchtec_ntb checks for a link by looking at the shared memory
window. If the magic number is correct and the other side indicates
their link is enabled then we take the link to be up.

Whenever we change our local link status we send a msg to the
other side to check whether it's up and change their status.

The current status is maintained in a flag so ntb_is_link_up
can return quickly.

We utilize Switchtec's link status notifier to also check link changes
when the switch notices a port changes state.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec_ntb: Add skeleton NTB driver
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:49 +0000 (12:19 -0600)]
NTB: switchtec_ntb: Add skeleton NTB driver

Add a skeleton NTB driver which will be filled out in subsequent patches.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec_ntb: Initialize hardware for doorbells and messages
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:48 +0000 (12:19 -0600)]
NTB: switchtec_ntb: Initialize hardware for doorbells and messages

Set up some hardware registers and creates interrupt service routines
for the doorbells and messages.

There are 64 doorbells in the switch that are shared between all
partitions. The upper 4 doorbells are also shared with the messages
and are therefore not used. Thus, this provides 28 doorbells for each
partition.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec_ntb: Initialize hardware for memory windows
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:47 +0000 (12:19 -0600)]
NTB: switchtec_ntb: Initialize hardware for memory windows

Add the code to initialize the memory windows in the hardware.
This includes setting up the requester ID table, and figuring out
which BAR corresponds to which memory window. (Seeing the switch
can be configured with any number of BARs.)

Also, seeing the device doesn't have hardware for scratchpads or
determining the link status, we create a shared memory window that has
these features. A magic number with a version component will be used
to determine if the other side's driver is actually up.

The shared memory window also informs the other side of the
size and count of the local memory windows.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec_ntb: Introduce initial NTB driver
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:46 +0000 (12:19 -0600)]
NTB: switchtec_ntb: Introduce initial NTB driver

Seeing the Switchtec NTB hardware shares the same endpoint as the
management endpoint we utilize the class_interface API to register
an NTB driver for every Switchtec device in the system that has the
NTB class code.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: Add check and comment for link up to mw_count() and mw_get_align()
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:45 +0000 (12:19 -0600)]
NTB: Add check and comment for link up to mw_count() and mw_get_align()

Adds a comment and a check to ntb_mw_get_align() so that it always fails
if the function is called before the link is up.

Also adds a comment to ntb_mw_count() to note that it may return 0 if
it is called before the link is up.

This is to prevent accidental mis-use in clients that are testing
on hardware that this doesn't matter for.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: Ensure ntb_mw_get_align() is only called when the link is up
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:44 +0000 (12:19 -0600)]
NTB: Ensure ntb_mw_get_align() is only called when the link is up

With Switchtec hardware it's impossible to get the alignment parameters
for a peer's memory window until the peer's driver has configured its
windows. Strictly speaking, the link doesn't have to be up for this,
but the link being up is the only way the client can tell that
the other side has been configured.

This patch converts ntb_transport and ntb_perf to use this function after
the link goes up. This simplifies these clients slightly because they
no longer have to store the alignment parameters. It also tweaks
ntb_tool so that peer_mw_trans will print zero if it is run before
the link goes up.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec: Add link event notifier callback
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:43 +0000 (12:19 -0600)]
NTB: switchtec: Add link event notifier callback

In order for the Switchtec NTB code to handle link change events we
create a notifier callback in the switchtec code which gets called
whenever an appropriate event interrupt occurs.

In order to preserve userspace's ability to follow these events,
we compare the event count with a stored copy from last time we
checked.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec: Add NTB hardware register definitions
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:42 +0000 (12:19 -0600)]
NTB: switchtec: Add NTB hardware register definitions

There are two additional regions: ctrl and dbmsg. The first is
for generic NTB control and memory windows. The second is for doorbells
and message registers. This patch also adds a number of related
constants for using these registers.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec: Export class symbol for use in upper layer driver
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:41 +0000 (12:19 -0600)]
NTB: switchtec: Export class symbol for use in upper layer driver

We export the class pointer symbol and add an extern define in the
Switchtec header file.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoNTB: switchtec: Move structure definitions into a common header
Logan Gunthorpe [Thu, 3 Aug 2017 18:19:40 +0000 (12:19 -0600)]
NTB: switchtec: Move structure definitions into a common header

Create the switchtec.h header in include/linux with hardware defines
and the switchtec_dev structure. Both moved directly from switchtec.c.
This is a prep patch for creating an NTB driver for Switchtec.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agontb: update maintainer list for Intel NTB driver
Dave Jiang [Fri, 28 Jul 2017 22:16:04 +0000 (15:16 -0700)]
ntb: update maintainer list for Intel NTB driver

Removing Jon since he no longer works at Intel.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
7 years agoclean up x86 platform driver default values
Linus Torvalds [Sat, 18 Nov 2017 20:09:51 +0000 (12:09 -0800)]
clean up x86 platform driver default values

The updates this merge window added several bogus default enablement for
new features.  We don't do that.  If people want new behavior, they ask
for it.

One 'default n' was also removed as pointless.  That's great, but there
were eight other ones in the same file that were left alone.

Fix it up.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Sat, 18 Nov 2017 19:22:04 +0000 (11:22 -0800)]
Merge tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Lots of good bugfixes, including:

   -  fix a number of races in the NFSv4+ state code

   -  fix some shutdown crashes in multiple-network-namespace cases

   -  relax our 4.1 session limits; if you've an artificially low limit
      to the number of 4.1 clients that can mount simultaneously, try
      upgrading"

* tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux: (22 commits)
  SUNRPC: Improve ordering of transport processing
  nfsd: deal with revoked delegations appropriately
  svcrdma: Enqueue after setting XPT_CLOSE in completion handlers
  nfsd: use nfs->ns.inum as net ID
  rpc: remove some BUG()s
  svcrdma: Preserve CB send buffer across retransmits
  nfds: avoid gettimeofday for nfssvc_boot time
  fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t
  fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t
  fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t
  lockd: double unregister of inetaddr notifiers
  nfsd4: catch some false session retries
  nfsd4: fix cached replies to solo SEQUENCE compounds
  sunrcp: make function _svc_create_xprt static
  SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status
  nfsd: use ARRAY_SIZE
  nfsd: give out fewer session slots as limit approaches
  nfsd: increase DRC cache limit
  nfsd: remove unnecessary nofilehandle checks
  nfs_common: convert int to bool
  ...

7 years agoMerge tag 'platform-drivers-x86-v4.15-1' of git://git.infradead.org/linux-platform...
Linus Torvalds [Sat, 18 Nov 2017 18:26:57 +0000 (10:26 -0800)]
Merge tag 'platform-drivers-x86-v4.15-1' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver updates from Andy Shevchenko:
 "Here is the collected material against Platform Drivers x86 subsystem.
  It's rather bit busy cycle for PDx86, mostly due to Dell SMBIOS driver
  activity

  For this cycle we have quite an update for the Dell SMBIOS driver
  including WMI work to provide an interface for SMBIOS tokens via sysfs
  and WMI support for 2017+ Dell laptop models. SMM dispatcher code is
  split into a separate driver followed by a new WMI dispatcher. The
  latter provides a character device interface to user space.

  The git history also contains a merge of immutable branch from Wolfram
  Sang in order to apply a dependent fix to the Intel CherryTrail
  Battery Management driver.

  Other Intel drivers got a lot of cleanups. The Turbo Boost Max 3.0
  support is added for Intel Skylake.

  Peaq WMI hotkeys driver gets its own maintainer and white list of
  supported models.

  Silead DMI is expanded to support few additional platforms.

  Tablet mode via GMMS ACPI method is added to support some ThinkPad
  tablets.

  new driver:
   - Add driver to force WMI Thunderbolt controller power status

  asus-wmi:
   -  Add lightbar led support

  dell-laptop:
   -  Allocate buffer before rfkill use

  dell-smbios:
   -  fix string overflow
   -  Add filtering support
   -  Introduce dispatcher for SMM calls
   -  Add a sysfs interface for SMBIOS tokens
   -  only run if proper oem string is detected
   -  Prefix class/select with cmd_
   -  Add pr_fmt definition to driver

  dell-smbios-smm:
   -  test for WSMT

  dell-smbios-wmi:
   -  release mutex lock on WMI call failure
   -  introduce userspace interface
   -  Add new WMI dispatcher driver

  dell-smo8800:
   -  remove redundant assignments to byte_data

  dell-wmi:
   -  don't check length returned
   -  clean up wmi descriptor check
   -  increase severity of some failures
   -  Do not match on descriptor GUID modalias
   -  Label driver as handling notifications

  dell-*wmi*:
   -  Relay failed initial probe to dependent drivers

  dell-wmi-descriptor:
   -  check if memory was allocated
   -  split WMI descriptor into it's own driver

  fujitsu-laptop:
   -  Fix radio LED detection
   -  Don't oops when FUJ02E3 is not presnt

  hp_accel:
   -  Add quirk for HP ProBook 440 G4

  hp-wmi:
   -  Fix tablet mode detection for convertibles

  ideapad-laptop:
   -  Add Lenovo Yoga 920-13IKB to no_hw_rfkill dmi list

  intel_cht_int33fe:
   -  Update fusb302 type string, add properties
   -  make a couple of local functions static
   -  Work around BIOS bug on some devices

  intel-hid:
   -  Power button suspend on Dell Latitude 7275

  intel_ips:
   -  Convert timers to use timer_setup()
   -  Remove FSF address from GPL notice
   -  Remove unneeded fields and label
   -  Keep pointer to struct device
   -  Use PCI_VDEVICE() macro
   -  Switch to new PCI IRQ allocation API
   -  Simplify error handling via devres API

  intel_pmc_ipc:
   -  Revert Use MFD framework to create dependent devices
   -  Use MFD framework to create dependent devices
   -  Use spin_lock to protect GCR updates
   -  Use devm_* calls in driver probe function

  intel_punit_ipc:
   -  Fix resource ioremap warning

  intel_telemetry:
   -  Remove useless default in Kconfig
   -  Add needed inclusion
   -  cleanup redundant headers
   -  Fix typos
   -  Fix load failure info

  intel_telemetry_debugfs:
   -  Use standard ARRAY_SIZE() macro

  intel_turbo_max_3:
   -  Add Skylake platform

  intel-wmi-thunderbolt:
   -  Silence error cases

  mlx-platform:
   -  make a couple of structures static

  peaq_wmi:
   -  Fix missing terminating entry for peaq_dmi_table

  peaq-wmi:
   -  Remove unnecessary checks from peaq_wmi_exit
   -  Add DMI check before binding to the WMI interface
   -  Revert Blacklist Lenovo ideapad 700-15ISK
   -  Blacklist Lenovo ideapad 700-15ISK

  silead_dmi:
   -  Add silead, home-button property to some tablets
   -  Add entry for the Digma e200 tablet
   -  Fix GP-electronic T701 entry
   -  Add entry for the Chuwi Hi8 Pro tablet

  sony-laptop:
   -  Drop variable assignment in sony_nc_setup_rfkill()
   -  Fix error handling in sony_nc_setup_rfkill()

  thinkpad_acpi:
   -  Implement tablet mode using GMMS method

  tools/wmi:
   -  add a sample for dell smbios communication over WMI

  wmi:
   -  release mutex on module acquistion failure
   -  create userspace interface for drivers
   -  Don't allow drivers to get each other's GUIDs
   -  Add new method wmidev_evaluate_method
   -  Destroy on cleanup rather than unregister
   -  Cleanup exit routine in reverse order of init
   -  Sort include list"

* tag 'platform-drivers-x86-v4.15-1' of git://git.infradead.org/linux-platform-drivers-x86: (74 commits)
  platform/x86: silead_dmi: Add silead, home-button property to some tablets
  platform/x86: dell-laptop: Allocate buffer before rfkill use
  platform/x86: dell-*wmi*: Relay failed initial probe to dependent drivers
  platform/x86: dell-wmi-descriptor: check if memory was allocated
  platform/x86: Revert intel_pmc_ipc: Use MFD framework to create dependent devices
  platform/x86: dell-smbios-wmi: release mutex lock on WMI call failure
  platform/x86: wmi: release mutex on module acquistion failure
  platform/x86: dell-smbios: fix string overflow
  platform/x86: intel_pmc_ipc: Use MFD framework to create dependent devices
  platform/x86: intel_punit_ipc: Fix resource ioremap warning
  platform/x86: dell-smo8800: remove redundant assignments to byte_data
  platform/x86: hp-wmi: Fix tablet mode detection for convertibles
  platform/x86: intel_ips: Convert timers to use timer_setup()
  platform/x86: sony-laptop: Drop variable assignment in sony_nc_setup_rfkill()
  platform/x86: sony-laptop: Fix error handling in sony_nc_setup_rfkill()
  tools/wmi: add a sample for dell smbios communication over WMI
  platform/x86: dell-smbios-wmi: introduce userspace interface
  platform/x86: wmi: create userspace interface for drivers
  platform/x86: dell-smbios: Add filtering support
  platform/x86: dell-smbios-smm: test for WSMT
  ...

7 years agoplatform/x86: silead_dmi: Add silead, home-button property to some tablets
Hans de Goede [Thu, 19 Oct 2017 07:17:28 +0000 (09:17 +0200)]
platform/x86: silead_dmi: Add silead, home-button property to some tablets

Add "silead,home-button" property to entries for tablets which have
a capacitive home button (typically a windows logo on the front).

This new property is checked for by the new capacitive home button
support in the silead touchscreen driver.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Sat, 18 Nov 2017 04:21:44 +0000 (20:21 -0800)]
Merge git://git./linux/kernel/git/davem/sparc

Pull sparc updates from David Miller:

 1) Add missing cmpxchg64() for 32-bit sparc.

 2) Timer conversions from Allen Pais and Kees Cook.

 3) vDSO support, from Nagarathnam Muthusamy.

 4) Fix sparc64 huge page table walks based upon bug report by Al Viro,
    from Nitin Gupta.

 5) Optimized fls() for T4 and above, from Vijay Kumar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix page table walk for PUD hugepages
  sparc64: Convert timers to user timer_setup()
  sparc64: convert mdesc_handle.refcnt from atomic_t to refcount_t
  sparc/led: Convert timers to use timer_setup()
  sparc64: Use sparc optimized fls and __fls for T4 and above
  sparc64: SPARC optimized __fls function
  sparc64: SPARC optimized fls function
  sparc64: Define SPARC default __fls function
  sparc64: Define SPARC default fls function
  vDSO for sparc
  sparc32: Add cmpxchg64().
  sbus: char: Move D7S_MINOR to include/linux/miscdevice.h
  sparc: time: Remove unneeded linux/miscdevice.h include
  sparc64: mmu_context: Add missing include files

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 18 Nov 2017 04:18:37 +0000 (20:18 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Revert regression inducing change to the IPSEC template resolver,
    from Steffen Klassert.

 2) Peeloffs can cause the wrong sk to be waken up in SCTP, fix from Xin
    Long.

 3) Min packet MTU size is wrong in cpsw driver, from Grygorii Strashko.

 4) Fix build failure in netfilter ctnetlink, from Arnd Bergmann.

 5) ISDN hisax driver checks pnp_irq() for errors incorrectly, from
    Arvind Yadav.

 6) Fix fealnx driver build failure on MIPS, from Huacai Chen.

 7) Fix into leak in SCTP, the scope_id of socket addresses is not
    always filled in. From Eric W. Biederman.

 8) MTU inheritance between physical function and representor fix in nfp
    driver, from Dirk van der Merwe.

 9) Fix memory leak in rsi driver, from Colin Ian King.

10) Fix expiration and generation ID handling of cached ipv4 redirect
    routes, from Xin Long.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
  net: usb: hso.c: remove unneeded DRIVER_LICENSE #define
  ibmvnic: fix dma_mapping_error call
  ipvlan: NULL pointer dereference panic in ipvlan_port_destroy
  route: also update fnhe_genid when updating a route cache
  route: update fnhe_expires for redirect when the fnhe exists
  sctp: set frag_point in sctp_setsockopt_maxseg correctly
  rsi: fix memory leak on buf and usb_reg_buf
  net/netlabel: Add list_next_rcu() in rcu_dereference().
  nfp: remove false positive offloads in flower vxlan
  nfp: register flower reprs for egress dev offload
  nfp: inherit the max_mtu from the PF netdev
  nfp: fix vlan receive MAC statistics typo
  nfp: fix flower offload metadata flag usage
  virto_net: remove empty file 'virtio_net.'
  net/sctp: Always set scope_id in sctp_inet6_skb_msgname
  fealnx: Fix building error on MIPS
  isdn: hisax: Fix pnp_irq's error checking for setup_teles3
  isdn: hisax: Fix pnp_irq's error checking for setup_sedlbauer_isapnp
  isdn: hisax: Fix pnp_irq's error checking for setup_niccy
  isdn: hisax: Fix pnp_irq's error checking for setup_ix1micro
  ...

7 years agoMerge tag 'hwlock-v4.15' of git://github.com/andersson/remoteproc
Linus Torvalds [Sat, 18 Nov 2017 04:16:20 +0000 (20:16 -0800)]
Merge tag 'hwlock-v4.15' of git://github.com/andersson/remoteproc

Pull hwspinlock update from Bjorn Andersson:
 "This changes the HWSPINLOCK core Kconfig option to bool, to aid when
  other core code depends on it"

* tag 'hwlock-v4.15' of git://github.com/andersson/remoteproc:
  hwspinlock: Change hwspinlock to a bool

7 years agoMerge tag 'rproc-v4.15' of git://github.com/andersson/remoteproc
Linus Torvalds [Sat, 18 Nov 2017 04:14:10 +0000 (20:14 -0800)]
Merge tag 'rproc-v4.15' of git://github.com/andersson/remoteproc

Pull remoteproc updates from Bjorn Andersson:
 "This adds an interface for configuring Qualcomm's "secure SMMU" and
  adds support for booting the modem Hexagon on MSM8996.

  Two new debugfs entries are added in the remoteproc core to introspect
  the list of memory carveouts and the loaded resource table"

* tag 'rproc-v4.15' of git://github.com/andersson/remoteproc:
  remoteproc: qcom: Fix error handling paths in order to avoid memory leaks
  remoteproc: qcom: Drop pr_err in q6v5_xfer_mem_ownership()
  remoteproc: debug: add carveouts list dump feature
  remoteproc: debug: add resource table dump feature
  remoteproc: qcom: Add support for mss remoteproc on msm8996
  remoteproc: qcom: Make secure world call for mem ownership switch
  remoteproc: qcom: refactor mss fw image loading sequence
  firmware: scm: Add new SCM call API for switching memory ownership

7 years agoMerge tag 'rpmsg-v4.15' of git://github.com/andersson/remoteproc
Linus Torvalds [Sat, 18 Nov 2017 04:12:08 +0000 (20:12 -0800)]
Merge tag 'rpmsg-v4.15' of git://github.com/andersson/remoteproc

Pull rpmsg updates from Bjorn Andersson:

 - turn RPMSG_VIRTIO into a user selectable config

 - fix few bugs in GLINK

 - provide the support for specifying initial buffer sizes for GLINK
   channels.

* tag 'rpmsg-v4.15' of git://github.com/andersson/remoteproc:
  rpmsg: glink: The mbox client knows_txdone
  rpmsg: glink: Add missing MODULE_LICENSE
  rpmsg: glink: Use best fit intent during tx
  rpmsg: glink: Add support to preallocate intents
  dt-bindings: soc: qcom: Support GLINK intents
  rpmsg: glink: Initialize the "intent_req_comp" completion variable
  rpmsg: Allow RPMSG_VIRTIO to be enabled via menuconfig or defconfig

7 years agoMerge tag 'hwmon-for-linus-v4.15-take2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 18 Nov 2017 04:10:05 +0000 (20:10 -0800)]
Merge tag 'hwmon-for-linus-v4.15-take2' of git://git./linux/kernel/git/groeck/linux-staging

Pull more hwmon updates/fixes from Guenter Roeck:

 - minor bug fix in k10temp driver

 - take advantage of added NULL check in i2c_unregister_device()

* tag 'hwmon-for-linus-v4.15-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (w83793) Remove duplicate NULL check
  hwmon: (w83792d) Remove duplicate NULL check
  hwmon: (w83791d) Remove duplicate NULL check
  hwmon: (w83781d) Remove duplicate NULL check
  hwmon: (k10temp) Correct model name for Ryzen 1600X

7 years agoMerge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Linus Torvalds [Sat, 18 Nov 2017 04:04:24 +0000 (20:04 -0800)]
Merge tag 'clk-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
 "We have two changes to the core framework this time around.

  The first being a large change that introduces runtime PM support to
  the clk framework. Now we properly call runtime PM operations on the
  device providing a clk when the clk is in use. This helps on SoCs
  where the clks provided by a device need something to be powered on
  before using the clks, like power domains or regulators. It also helps
  power those things down when clks aren't in use.

  The other core change is a devm API addition for clk providers so we
  can get rid of a bunch of clk driver remove functions that are just
  doing of_clk_del_provider().

  Outside of the core, we have the usual addition of clk drivers and
  smattering of non-critical fixes to existing drivers. The biggest diff
  is support for Mediatek MT2712 and MT7622 SoCs, but those patches
  really just add a bunch of data.

  By the way, we're trying something new here where we build the tree up
  with topic branches. We plan to work this into our workflow so that we
  don't step on each other's toes, and so the fixes branch can be merged
  on an as-needed basis.

  Summary:

  Core:
   - runtime PM support for clk providers
   - devm API for of_clk_add_hw_provider()

  New Drivers:
   - Mediatek MT2712 and MT7622
   - Renesas R-Car V3M SoC

  Updates:
   - runtime PM support for Samsung exynos5433/exynos4412 providers
   - removal of clkdev aliases on Samsung SoCs
   - convert clk-gpio to use gpio descriptors
   - various driver cleanups to match kernel coding style
   - Amlogic Video Processing Unit VPU and VAPB clks
   - sigma-delta modulation for Allwinner audio PLLs
   - Allwinner A83t Display clks
   - support for the second display unit clock on Renesas RZ/G1E
   - suspend/resume support for Renesas R-Car Gen3 CPG/MSSR
   - new clock ids for Rockchip rk3188 and rk3368 SoCs
   - various 'const' markings on clk_ops structures
   - RPM clk support on Qualcomm MSM8996/MSM8660 SoCs"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (137 commits)
  clk: stm32h7: fix test of clock config
  clk: pxa: fix building on older compilers
  clk: sunxi-ng: a83t: Fix i2c buses bits
  clk: ti: dra7-atl-clock: fix child-node lookups
  clk: qcom: common: fix legacy board-clock registration
  clk: uniphier: fix DAPLL2 clock rate of Pro5
  clk: uniphier: fix parent of miodmac clock data
  clk: hi3798cv200: correct parent mux clock for 'clk_sdio0_ciu'
  clk: hisilicon: Delete an error message for a failed memory allocation in hisi_register_clkgate_sep()
  clk: hi3660: fix incorrect uart3 clock freqency
  clk: kona-setup: Delete error messages for failed memory allocations
  ARC: clk: fix spelling mistake: "configurarion" -> "configuration"
  clk: cdce925: remove redundant check for non-null parent_name
  clk: versatile: Improve sizeof() usage
  clk: versatile: Delete error messages for failed memory allocations
  clk: ux500: Improve sizeof() usage
  clk: ux500: Delete error messages for failed memory allocations
  clk: spear: Delete error messages for failed memory allocations
  clk: ti: Delete error messages for failed memory allocations
  clk: mmp: Adjust checks for NULL pointers
  ...

7 years agoMerge tag 'kbuild-misc-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahi...
Linus Torvalds [Sat, 18 Nov 2017 01:51:33 +0000 (17:51 -0800)]
Merge tag 'kbuild-misc-v4.15' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild misc updates from Masahiro Yamada:

 - Clean up and fix RPM package build

 - Fix a warning in DEB package build

 - Improve coccicheck script

 - Improve some semantic patches

* tag 'kbuild-misc-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  docs: dev-tools: coccinelle: delete out of date wiki reference
  coccinelle: orplus: reorganize to improve performance
  coccinelle: use exists to improve efficiency
  builddeb: Pass the kernel:debarch substvar to dpkg-genchanges
  Coccinelle: use false positive annotation
  coccinelle: fix verbose message about .cocci file being run
  coccinelle: grep Options and Requires fields more precisely
  Coccinelle: make DEBUG_FILE option more useful
  coccinelle: api: detect identical chip data arrays
  coccinelle: Improve setup_timer.cocci matching
  Coccinelle: setup_timer: improve messages from setup_timer
  kbuild: rpm-pkg: do not force -jN in submake
  kbuild: rpm-pkg: keep spec file until make mrproper
  kbuild: rpm-pkg: fix jobserver unavailable warning
  kbuild: rpm-pkg: replace $RPM_BUILD_ROOT with %{buildroot}
  kbuild: rpm-pkg: fix build error when CONFIG_MODULES is disabled
  kbuild: rpm-pkg: refactor mkspec with here doc
  kbuild: rpm-pkg: clean up mkspec
  kbuild: rpm-pkg: install vmlinux.bz2 unconditionally
  kbuild: rpm-pkg: remove ppc64 specific image handling

7 years agoMerge tag 'kbuild-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy...
Linus Torvalds [Sat, 18 Nov 2017 01:45:29 +0000 (17:45 -0800)]
Merge tag 'kbuild-v4.15' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:
 "One of the most remarkable improvements in this cycle is, Kbuild is
  now able to cache the result of shell commands. Some variables are
  expensive to compute, for example, $(call cc-option,...) invokes the
  compiler. It is not efficient to redo this computation every time,
  even when we are not actually building anything. Kbuild creates a
  hidden file ".cache.mk" that contains invoked shell commands and their
  results. The speed-up should be noticeable.

  Summary:

   - Fix arch build issues (hexagon, sh)

   - Clean up various Makefiles and scripts

   - Fix wrong usage of {CFLAGS,LDFLAGS}_MODULE in arch Makefiles

   - Cache variables that are expensive to compute

   - Improve cc-ldopton and ld-option for Clang

   - Optimize output directory creation"

* tag 'kbuild-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (30 commits)
  kbuild: move coccicheck help from scripts/Makefile.help to top Makefile
  sh: decompressor: add shipped files to .gitignore
  frv: .gitignore: ignore vmlinux.lds
  selinux: remove unnecessary assignment to subdir-
  kbuild: specify FORCE in Makefile.headersinst as .PHONY target
  kbuild: remove redundant mkdir from ./Kbuild
  kbuild: optimize object directory creation for incremental build
  kbuild: create object directories simpler and faster
  kbuild: filter-out PHONY targets from "targets"
  kbuild: remove redundant $(wildcard ...) for cmd_files calculation
  kbuild: create directory for make cache only when necessary
  sh: select KBUILD_DEFCONFIG depending on ARCH
  kbuild: fix linker feature test macros when cross compiling with Clang
  kbuild: shrink .cache.mk when it exceeds 1000 lines
  kbuild: do not call cc-option before KBUILD_CFLAGS initialization
  kbuild: Cache a few more calls to the compiler
  kbuild: Add a cache for generated variables
  kbuild: add forward declaration of default target to Makefile.asm-generic
  kbuild: remove KBUILD_SUBDIR_ASFLAGS and KBUILD_SUBDIR_CCFLAGS
  hexagon/kbuild: replace CFLAGS_MODULE with KBUILD_CFLAGS_MODULE
  ...

7 years agonet: usb: hso.c: remove unneeded DRIVER_LICENSE #define
Greg Kroah-Hartman [Fri, 17 Nov 2017 14:19:39 +0000 (15:19 +0100)]
net: usb: hso.c: remove unneeded DRIVER_LICENSE #define

There is no need to #define the license of the driver, just put it in
the MODULE_LICENSE() line directly as a text string.

This allows tools that check that the module license matches the source
code license to work properly, as there is no need to unwind the
unneeded dereference.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andreas Kemnade <andreas@kemnade.info>
Cc: Johan Hovold <johan@kernel.org>
Reported-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: fix dma_mapping_error call
Desnes Augusto Nunes do Rosario [Fri, 17 Nov 2017 11:09:04 +0000 (09:09 -0200)]
ibmvnic: fix dma_mapping_error call

This patch fixes the dma_mapping_error call to use the correct dma_addr
which is inside the ibmvnic_vpd struct. Moreover, it fixes an uninitialized
warning regarding a local dma_addr variable which is not used anymore.

Fixes: 4e6759be28e4 ("ibmvnic: Feature implementation of VPD for the ibmvnic driver")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipvlan: NULL pointer dereference panic in ipvlan_port_destroy
Girish Moodalbail [Fri, 17 Nov 2017 07:16:17 +0000 (23:16 -0800)]
ipvlan: NULL pointer dereference panic in ipvlan_port_destroy

When call to register_netdevice() (called from ipvlan_link_new()) fails,
we call ipvlan_uninit() (through ndo_uninit()) to destroy the ipvlan
port. After returning unsuccessfully from register_netdevice() we go
ahead and call ipvlan_port_destroy() again which causes NULL pointer
dereference panic. Fix the issue by making ipvlan_init() and
ipvlan_uninit() call symmetric.

The ipvlan port will now be created inside ipvlan_init() and will be
destroyed in ipvlan_uninit().

Fixes: 2ad7bf363841 (ipvlan: Initial check-in of the IPVLAN driver)
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoroute: also update fnhe_genid when updating a route cache
Xin Long [Fri, 17 Nov 2017 06:27:18 +0000 (14:27 +0800)]
route: also update fnhe_genid when updating a route cache

Now when ip route flush cache and it turn out all fnhe_genid != genid.
If a redirect/pmtu icmp packet comes and the old fnhe is found and all
it's members but fnhe_genid will be updated.

Then next time when it looks up route and tries to rebind this fnhe to
the new dst, the fnhe will be flushed due to fnhe_genid != genid. It
causes this redirect/pmtu icmp packet acutally not to be applied.

This patch is to also reset fnhe_genid when updating a route cache.

Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoroute: update fnhe_expires for redirect when the fnhe exists
Xin Long [Fri, 17 Nov 2017 06:27:06 +0000 (14:27 +0800)]
route: update fnhe_expires for redirect when the fnhe exists

Now when creating fnhe for redirect, it sets fnhe_expires for this
new route cache. But when updating the exist one, it doesn't do it.
It will cause this fnhe never to be expired.

Paolo already noticed it before, in Jianlin's test case, it became
even worse:

When ip route flush cache, the old fnhe is not to be removed, but
only clean it's members. When redirect comes again, this fnhe will
be found and updated, but never be expired due to fnhe_expires not
being set.

So fix it by simply updating fnhe_expires even it's for redirect.

Fixes: aee06da6726d ("ipv4: use seqlock for nh_exceptions")
Reported-by: Jianlin Shi <jishi@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: set frag_point in sctp_setsockopt_maxseg correctly
Xin Long [Fri, 17 Nov 2017 06:11:11 +0000 (14:11 +0800)]
sctp: set frag_point in sctp_setsockopt_maxseg correctly

Now in sctp_setsockopt_maxseg user_frag or frag_point can be set with
val >= 8 and val <= SCTP_MAX_CHUNK_LEN. But both checks are incorrect.

val >= 8 means frag_point can even be less than SCTP_DEFAULT_MINSEGMENT.
Then in sctp_datamsg_from_user(), when it's value is greater than cookie
echo len and trying to bundle with cookie echo chunk, the first_len will
overflow.

The worse case is when it's value is equal as cookie echo len, first_len
becomes 0, it will go into a dead loop for fragment later on. In Hangbin
syzkaller testing env, oom was even triggered due to consecutive memory
allocation in that loop.

Besides, SCTP_MAX_CHUNK_LEN is the max size of the whole chunk, it should
deduct the data header for frag_point or user_frag check.

This patch does a proper check with SCTP_DEFAULT_MINSEGMENT subtracting
the sctphdr and datahdr, SCTP_MAX_CHUNK_LEN subtracting datahdr when
setting frag_point via sockopt. It also improves sctp_setsockopt_maxseg
codes.

Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agorsi: fix memory leak on buf and usb_reg_buf
Colin Ian King [Thu, 16 Nov 2017 17:39:18 +0000 (17:39 +0000)]
rsi: fix memory leak on buf and usb_reg_buf

In the cases where len is too long, the error return path fails to
kfree allocated buffers buf and usb_reg_buf.  The simplest fix is to
perform the sanity check on len before the allocations to avoid having
to do the kfree'ing in the first place.

Detected by CoverityScan, CID#1452258,1452259 ("Resource Leak")

Fixes: 59f73e2ae185 ("rsi: check length before USB read/write register")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/netlabel: Add list_next_rcu() in rcu_dereference().
Tim Hansen [Thu, 16 Nov 2017 17:03:34 +0000 (12:03 -0500)]
net/netlabel: Add list_next_rcu() in rcu_dereference().

Add list_next_rcu() for fetching next list in rcu_deference safely.

Found with sparse in linux-next tree on tag next-20171116.

Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 18 Nov 2017 00:56:17 +0000 (16:56 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

 - a bit more MM

 - procfs updates

 - dynamic-debug fixes

 - lib/ updates

 - checkpatch

 - epoll

 - nilfs2

 - signals

 - rapidio

 - PID management cleanup and optimization

 - kcov updates

 - sysvipc updates

 - quite a few misc things all over the place

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
  EXPERT Kconfig menu: fix broken EXPERT menu
  include/asm-generic/topology.h: remove unused parent_node() macro
  arch/tile/include/asm/topology.h: remove unused parent_node() macro
  arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro
  arch/sh/include/asm/topology.h: remove unused parent_node() macro
  arch/ia64/include/asm/topology.h: remove unused parent_node() macro
  drivers/pcmcia/sa1111_badge4.c: avoid unused function warning
  mm: add infrastructure for get_user_pages_fast() benchmarking
  sysvipc: make get_maxid O(1) again
  sysvipc: properly name ipc_addid() limit parameter
  sysvipc: duplicate lock comments wrt ipc_addid()
  sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE
  initramfs: use time64_t timestamps
  drivers/watchdog: make use of devm_register_reboot_notifier()
  kernel/reboot.c: add devm_register_reboot_notifier()
  kcov: update documentation
  Makefile: support flag -fsanitizer-coverage=trace-cmp
  kcov: support comparison operands collection
  kcov: remove pointless current != NULL check
  kernel/panic.c: add TAINT_AUX
  ...

7 years agoEXPERT Kconfig menu: fix broken EXPERT menu
Randy Dunlap [Fri, 17 Nov 2017 23:31:47 +0000 (15:31 -0800)]
EXPERT Kconfig menu: fix broken EXPERT menu

Clean up the EXPERT menu (yet again).

Move FHANDLE and CHECKPOINT_RESTORE into the primary EXPERT menu since
they already depend on EXPERT.

Move BPF_SYSCALL and USERFAULTFD out of the EXPERT Kconfig symbols menu
list since they do not depend on EXPERT and were breaking the continuity
of that menu list.

Move all of the KALLSYMS Kconfig symbols to the end of the EXPERT menu.
This separates the kernel services from the build options.

This patch depends on [PATCH] pci: move PCI_QUIRKS to the PCI bus menu
(https://lkml.org/lkml/2017/11/2/907).

Link: http://lkml.kernel.org/r/72e4465a-a5ff-cb3c-1a90-11aa4861b161@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net> [BPF]
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoinclude/asm-generic/topology.h: remove unused parent_node() macro
Dou Liyang [Fri, 17 Nov 2017 23:31:43 +0000 (15:31 -0800)]
include/asm-generic/topology.h: remove unused parent_node() macro

Commit a7be6e5a7f8d ("mm: drop useless local parameters of
__register_one_node()") removed the last user of parent_node().

The parent_node() macro in generic situation is unnecessary.

Remove it for cleanup.

Link: http://lkml.kernel.org/r/1504234599-29533-8-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoarch/tile/include/asm/topology.h: remove unused parent_node() macro
Dou Liyang [Fri, 17 Nov 2017 23:31:40 +0000 (15:31 -0800)]
arch/tile/include/asm/topology.h: remove unused parent_node() macro

Commit a7be6e5a7f8d ("mm: drop useless local parameters of
__register_one_node()") removed the last user of parent_node().

The parent_node() macro in tile platform is unnecessary.

Remove it for cleanup.

Link: http://lkml.kernel.org/r/1504234599-29533-7-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoarch/sparc/include/asm/topology_64.h: remove unused parent_node() macro
Dou Liyang [Fri, 17 Nov 2017 23:31:36 +0000 (15:31 -0800)]
arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro

Commit a7be6e5a7f8d ("mm: drop useless local parameters of
__register_one_node()") removed the last user of parent_node().

The parent_node() macro in SPARC64 platform is unnecessary.

Remove it for cleanup.

Link: http://lkml.kernel.org/r/1504234599-29533-6-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoarch/sh/include/asm/topology.h: remove unused parent_node() macro
Dou Liyang [Fri, 17 Nov 2017 23:31:33 +0000 (15:31 -0800)]
arch/sh/include/asm/topology.h: remove unused parent_node() macro

Commit a7be6e5a7f8d ("mm: drop useless local parameters of
__register_one_node()") removed the last user of parent_node().

The parent_node() macro in SUPERH platform is unnecessary.

Remove it for cleanup.

Link: http://lkml.kernel.org/r/1504234599-29533-5-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoarch/ia64/include/asm/topology.h: remove unused parent_node() macro
Dou Liyang [Fri, 17 Nov 2017 23:31:29 +0000 (15:31 -0800)]
arch/ia64/include/asm/topology.h: remove unused parent_node() macro

Commit a7be6e5a7f8d ("mm: drop useless local parameters of
__register_one_node()") removed the last user of parent_node().

The parent_node() macro in IA64(Itanium) platform is unnecessary.

Remove it for cleanup.

Link: http://lkml.kernel.org/r/1504234599-29533-2-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodrivers/pcmcia/sa1111_badge4.c: avoid unused function warning
Arnd Bergmann [Fri, 17 Nov 2017 23:31:26 +0000 (15:31 -0800)]
drivers/pcmcia/sa1111_badge4.c: avoid unused function warning

pcmv_setup() is only used when the badge4 driver is built-in, but not
when it is a loadable module:

  drivers/pcmcia/sa1111_badge4.c:153:122: error: 'pcmv_setup' defined but not used [-Werror=unused-function]

This adds an #ifdef to avoid the definition of the unused function in
the modular case.

Link: http://lkml.kernel.org/r/20170911201133.3421636-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <rmk@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: add infrastructure for get_user_pages_fast() benchmarking
Kirill A. Shutemov [Fri, 17 Nov 2017 23:31:22 +0000 (15:31 -0800)]
mm: add infrastructure for get_user_pages_fast() benchmarking

Performance of get_user_pages_fast() is critical for some workloads, but
it's tricky to test it directly.

This patch provides /sys/kernel/debug/gup_benchmark that helps with
testing performance of it.

See tools/testing/selftests/vm/gup_benchmark.c for userspace
counterpart.

Link: http://lkml.kernel.org/r/20170908215603.9189-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosysvipc: make get_maxid O(1) again
Davidlohr Bueso [Fri, 17 Nov 2017 23:31:18 +0000 (15:31 -0800)]
sysvipc: make get_maxid O(1) again

For a custom microbenchmark on a 3.30GHz Xeon SandyBridge, which calls
IPC_STAT over and over, it was calculated that, on avg the cost of
ipc_get_maxid() for increasing amounts of keys was:

 10 keys: ~900 cycles
 100 keys: ~15000 cycles
 1000 keys: ~150000 cycles
 10000 keys: ~2100000 cycles

This is unsurprising as maxid is currently O(n).

By having the max_id available in O(1) we save all those cycles for each
semctl(_STAT) command, the idr_find can be expensive -- which some real
(customer) workloads actually poll on.

Note that this used to be the case, until commit 7ca7e564e04 ("ipc:
store ipcs into IDRs").  The cost is the extra idr_find when doing
RMIDs, but we simply go backwards, and should not take too many
iterations to find the new value.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170831172049.14576-5-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosysvipc: properly name ipc_addid() limit parameter
Davidlohr Bueso [Fri, 17 Nov 2017 23:31:15 +0000 (15:31 -0800)]
sysvipc: properly name ipc_addid() limit parameter

This is better understood as a limit, instead of size; exactly like the
function comment indicates.  Rename it.

Link: http://lkml.kernel.org/r/20170831172049.14576-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosysvipc: duplicate lock comments wrt ipc_addid()
Davidlohr Bueso [Fri, 17 Nov 2017 23:31:11 +0000 (15:31 -0800)]
sysvipc: duplicate lock comments wrt ipc_addid()

The comment in msgqueues when using ipc_addid() is quite useful imo.
Duplicate it for shm and semaphores.

Link: http://lkml.kernel.org/r/20170831172049.14576-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE
Davidlohr Bueso [Fri, 17 Nov 2017 23:31:08 +0000 (15:31 -0800)]
sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE

Patch series "sysvipc: ipc-key management improvements".

Here are a few improvements I spotted while eyeballing Guillaume's
rhashtable implementation for ipc keys.  The first and fourth patches
are the interesting ones, the middle two are trivial.

This patch (of 4):

The next_id object-allocation functionality was introduced in commit
03f595668017 ("ipc: add sysctl to specify desired next object id").

Given that these new entries are _only_ exported under the
CONFIG_CHECKPOINT_RESTORE option, there is no point for the common case
to even know about ->next_id.  As such rewrite ipc_buildid() such that
it can do away with the field as well as unnecessary branches when
adding a new identifier.  The end result also better differentiates both
cases, so the code ends up being cleaner; albeit the small duplications
regarding the default case.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170831172049.14576-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoinitramfs: use time64_t timestamps
Arnd Bergmann [Fri, 17 Nov 2017 23:31:04 +0000 (15:31 -0800)]
initramfs: use time64_t timestamps

The cpio format uses a 32-bit number to encode file timestamps, which
breaks initramfs support in 2038.  This reinterprets the timestamp as
unsigned, to give us another 68 years and avoids breaking until 2106.

Link: http://lkml.kernel.org/r/20171019095536.801199-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Lokesh Vutla <lokeshvutla@ti.com>
Cc: Stafford Horne <shorne@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodrivers/watchdog: make use of devm_register_reboot_notifier()
Andrey Smirnov [Fri, 17 Nov 2017 23:31:01 +0000 (15:31 -0800)]
drivers/watchdog: make use of devm_register_reboot_notifier()

Save a bit of cleanup code by leveraging newly added
devm_register_reboot_notifier().

[akpm@linux-foundation.org: small cleanup: avoid 80-col tricks]
Link: http://lkml.kernel.org/r/20170411160615.9784-1-andrew.smirnov@gmail.com
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Cc: Chris Healy <cphealy@gmail.com>
Cc: Wim Van Sebroeck <wim@iguana.be>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokernel/reboot.c: add devm_register_reboot_notifier()
Andrey Smirnov [Fri, 17 Nov 2017 23:30:57 +0000 (15:30 -0800)]
kernel/reboot.c: add devm_register_reboot_notifier()

Add devm_* wrapper around register_reboot_notifier to simplify device
specific reboot notifier registration/unregistration.

[akpm@linux-foundation.org: move `struct device' forward decl to top-of-file]
Link: http://lkml.kernel.org/r/20170320171753.1705-1-andrew.smirnov@gmail.com
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokcov: update documentation
Victor Chibotaru [Fri, 17 Nov 2017 23:30:53 +0000 (15:30 -0800)]
kcov: update documentation

The updated documentation describes new KCOV mode for collecting
comparison operands.

Link: http://lkml.kernel.org/r/20171011095459.70721-3-glider@google.com
Signed-off-by: Victor Chibotaru <tchibo@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMakefile: support flag -fsanitizer-coverage=trace-cmp
Victor Chibotaru [Fri, 17 Nov 2017 23:30:50 +0000 (15:30 -0800)]
Makefile: support flag -fsanitizer-coverage=trace-cmp

The flag enables Clang instrumentation of comparison operations
(currently not supported by GCC).  This instrumentation is needed by the
new KCOV device to collect comparison operands.

Link: http://lkml.kernel.org/r/20171011095459.70721-2-glider@google.com
Signed-off-by: Victor Chibotaru <tchibo@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokcov: support comparison operands collection
Victor Chibotaru [Fri, 17 Nov 2017 23:30:46 +0000 (15:30 -0800)]
kcov: support comparison operands collection

Enables kcov to collect comparison operands from instrumented code.
This is done by using Clang's -fsanitize=trace-cmp instrumentation
(currently not available for GCC).

The comparison operands help a lot in fuzz testing.  E.g.  they are used
in Syzkaller to cover the interiors of conditional statements with way
less attempts and thus make previously unreachable code reachable.

To allow separate collection of coverage and comparison operands two
different work modes are implemented.  Mode selection is now done via a
KCOV_ENABLE ioctl call with corresponding argument value.

Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.com
Signed-off-by: Victor Chibotaru <tchibo@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokcov: remove pointless current != NULL check
Andrey Ryabinin [Fri, 17 Nov 2017 23:30:42 +0000 (15:30 -0800)]
kcov: remove pointless current != NULL check

__sanitizer_cov_trace_pc() is a hot code, so it's worth to remove
pointless '!current' check.  Current is never NULL.

Link: http://lkml.kernel.org/r/20170929162221.32500-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokernel/panic.c: add TAINT_AUX
Borislav Petkov [Fri, 17 Nov 2017 23:30:38 +0000 (15:30 -0800)]
kernel/panic.c: add TAINT_AUX

This is the gist of a patch which we've been forward-porting in our
kernels for a long time now and it probably would make a good sense to
have such TAINT_AUX flag upstream which can be used by each distro etc,
how they see fit.  This way, we won't need to forward-port a distro-only
version indefinitely.

Add an auxiliary taint flag to be used by distros and others.  This
obviates the need to forward-port whatever internal solutions people
have in favor of a single flag which they can map arbitrarily to a
definition of their pleasing.

The "X" mnemonic could also mean eXternal, which would be taint from a
distro or something else but not the upstream kernel.  We will use it to
mark modules for which we don't provide support.  I.e., a really
eXternal module.

Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agopid: remove pidhash
Gargi Sharma [Fri, 17 Nov 2017 23:30:34 +0000 (15:30 -0800)]
pid: remove pidhash

pidhash is no longer required as all the information can be looked up
from idr tree.  nr_hashed represented the number of pids that had been
hashed.  Since, nr_hashed and PIDNS_HASH_ADDING are no longer relevant,
it has been renamed to pid_allocated and PIDNS_ADDING respectively.

[gs051095@gmail.com: v6]
Link: http://lkml.kernel.org/r/1507760379-21662-3-git-send-email-gs051095@gmail.com
Link: http://lkml.kernel.org/r/1507583624-22146-3-git-send-email-gs051095@gmail.com
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com> [ia64]
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agopid: replace pid bitmap implementation with IDR API
Gargi Sharma [Fri, 17 Nov 2017 23:30:30 +0000 (15:30 -0800)]
pid: replace pid bitmap implementation with IDR API

Patch series "Replacing PID bitmap implementation with IDR API", v4.

This series replaces kernel bitmap implementation of PID allocation with
IDR API.  These patches are written to simplify the kernel by replacing
custom code with calls to generic code.

The following are the stats for pid and pid_namespace object files
before and after the replacement.  There is a noteworthy change between
the IDR and bitmap implementation.

Before
   text       data        bss        dec        hex    filename
   8447       3894         64      12405       3075    kernel/pid.o
After
   text       data        bss        dec        hex    filename
   3397        304          0       3701        e75    kernel/pid.o

Before
   text       data        bss        dec        hex    filename
   5692       1842        192       7726       1e2e    kernel/pid_namespace.o
After
   text       data        bss        dec        hex    filename
   2854        216         16       3086        c0e    kernel/pid_namespace.o

The following are the stats for ps, pstree and calling readdir on /proc
for 10,000 processes.

ps:
        With IDR API    With bitmap
real    0m1.479s        0m2.319s
user    0m0.070s        0m0.060s
sys     0m0.289s        0m0.516s

pstree:
        With IDR API    With bitmap
real    0m1.024s        0m1.794s
user    0m0.348s        0m0.612s
sys     0m0.184s        0m0.264s

proc:
        With IDR API    With bitmap
real    0m0.059s        0m0.074s
user    0m0.000s        0m0.004s
sys     0m0.016s        0m0.016s

This patch (of 2):

Replace the current bitmap implementation for Process ID allocation.
Functions that are no longer required, for example, free_pidmap(),
alloc_pidmap(), etc.  are removed.  The rest of the functions are
modified to use the IDR API.  The change was made to make the PID
allocation less complex by replacing custom code with calls to generic
API.

[gs051095@gmail.com: v6]
Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com
[avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl]
Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokernel/sysctl.c: code cleanups
Ola N. Kaldestad [Fri, 17 Nov 2017 23:30:26 +0000 (15:30 -0800)]
kernel/sysctl.c: code cleanups

Remove unnecessary else block, remove redundant return and call to kfree
in if block.

Link: http://lkml.kernel.org/r/1510238435-1655-1-git-send-email-mail@okal.no
Signed-off-by: Ola N. Kaldestad <mail@okal.no>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoDocumentation/sysctl/vm.txt: fix typo
Kangmin Park [Fri, 17 Nov 2017 23:30:23 +0000 (15:30 -0800)]
Documentation/sysctl/vm.txt: fix typo

Link: http://lkml.kernel.org/r/CAKW4uUyCi=PnKf3epgFVz8z=1tMtHSOHNm+fdNxrNw3-THvRCA@mail.gmail.com
Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodrivers/rapidio/devices/rio_mport_cdev.c: fix error handling in 'rio_dma_transfer()'
Christophe JAILLET [Fri, 17 Nov 2017 23:38:03 +0000 (15:38 -0800)]
drivers/rapidio/devices/rio_mport_cdev.c: fix error handling in 'rio_dma_transfer()'

In case of error, 'dma_map_sg()' returns 0, not a negative value.  There
is BUG_ON() in 'dma_map_sg_attrs()' which makes sure of that.

Link: http://lkml.kernel.org/r/d4235bd2b9274e99f6c86ea71b1fa1c7bd8d0c08.1505687047.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Christian K_nig <christian.koenig@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodrivers/rapidio/devices/rio_mport_cdev.c: fix resource leak in error handling path...
Christophe JAILLET [Fri, 17 Nov 2017 23:37:57 +0000 (15:37 -0800)]
drivers/rapidio/devices/rio_mport_cdev.c: fix resource leak in error handling path in 'rio_dma_transfer()'

If 'dma_map_sg()', we should branch to the existing error handling path
to free some resources before returning.

Link: http://lkml.kernel.org/r/61292a4f369229eee03394247385e955027283f8.1505687047.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Christian K_nig <christian.koenig@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agorapidio: constify rio_device_id
Arvind Yadav [Fri, 17 Nov 2017 23:30:15 +0000 (15:30 -0800)]
rapidio: constify rio_device_id

rio_device_id are not supposed to change at runtime.  rio driver is
working with const 'id_table'.  So mark the non-const rio_device_id
structs as const.

Link: http://lkml.kernel.org/r/1503734627-6058-2-git-send-email-arvind.yadav.cs@gmail.com
Link: http://lkml.kernel.org/r/1503734627-6058-3-git-send-email-arvind.yadav.cs@gmail.com
Link: http://lkml.kernel.org/r/1503734627-6058-4-git-send-email-arvind.yadav.cs@gmail.com
Link: http://lkml.kernel.org/r/1503734627-6058-5-git-send-email-arvind.yadav.cs@gmail.com
Link: http://lkml.kernel.org/r/1503734627-6058-6-git-send-email-arvind.yadav.cs@gmail.com
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokdump: print a message in case parse_crashkernel_mem resulted in zero bytes
Dave Young [Fri, 17 Nov 2017 23:30:12 +0000 (15:30 -0800)]
kdump: print a message in case parse_crashkernel_mem resulted in zero bytes

parse_crashkernel_mem() silently returns if we get zero bytes in the
parsing function.  It is useful for debugging to add a message,
especially if the kernel cannot boot correctly.

Add a pr_info instead of pr_warn because it is expected behavior for
size = 0, eg.  crashkernel=2G-4G:128M, size will be 0 in case system
memory is less than 2G.

Link: http://lkml.kernel.org/r/20171114080129.GA6115@dhcp-128-65.nay.redhat.com
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
Oleg Nesterov [Fri, 17 Nov 2017 23:30:08 +0000 (15:30 -0800)]
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()

complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.

If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.

After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.

This should hopefully fix the problem reported by Dmitry.  This
test-case

static int init(void *arg)
{
for (;;)
pause();
}

int main(void)
{
char stack[16 * 1024];

for (;;) {
int pid = clone(init, stack + sizeof(stack)/2,
CLONE_NEWPID | SIGCHLD, NULL);
assert(pid > 0);

assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
assert(waitpid(-1, NULL, WSTOPPED) == pid);

assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
assert(pid == wait(NULL));
}
}

triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop().  do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.

And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.

Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Kyle Huey <me@kylehuey.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
Oleg Nesterov [Fri, 17 Nov 2017 23:30:04 +0000 (15:30 -0800)]
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals

Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
signals even if force == T.  This simplifies the next change and this
matches the same check in get_signal() which will drop these signals
anyway.

Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
Oleg Nesterov [Fri, 17 Nov 2017 23:30:01 +0000 (15:30 -0800)]
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL

The comment in sig_ignored() says "Tracers may want to know about even
ignored signals" but SIGKILL can not be reported to debugger and it is
just wrong to return 0 in this case: SIGKILL should only kill the
SIGNAL_UNKILLABLE task if it comes from the parent ns.

Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
sig_task_ignored().

SISGTOP coming from within the namespace is not really right too but at
least debugger can intercept it, and we can't drop it here because this
will break "gdb -p 1": ptrace_attach() won't work.  Perhaps we will add
another ->ptrace check later, we will see.

Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agofat: remove redundant assignment of 0 to slots
Colin Ian King [Fri, 17 Nov 2017 23:29:57 +0000 (15:29 -0800)]
fat: remove redundant assignment of 0 to slots

The variable slots is being assigned a value of zero that is never read,
slots is being updated again a few lines later.  Remove this redundant
assignment.

Cleans clang warning: Value stored to 'slots' is never read

Link: http://lkml.kernel.org/r/20171017140258.22536-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agohfs/hfsplus: clean up unused variables in bnode.c
Christos Gkekas [Fri, 17 Nov 2017 23:29:54 +0000 (15:29 -0800)]
hfs/hfsplus: clean up unused variables in bnode.c

Delete variables 'tree' and 'sb', which are set but never used.

Link: http://lkml.kernel.org/r/1507977146-15875-1-git-send-email-chris.gekas@gmail.com
Signed-off-by: Christos Gkekas <chris.gekas@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agonilfs2: remove inode->i_version initialization
Jeff Layton [Fri, 17 Nov 2017 23:29:50 +0000 (15:29 -0800)]
nilfs2: remove inode->i_version initialization

It's never used in nilfs2.

Link: http://lkml.kernel.org/r/1510064486-1728-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agonilfs2: use octal for unreadable permission macro
Ryusuke Konishi [Fri, 17 Nov 2017 23:29:46 +0000 (15:29 -0800)]
nilfs2: use octal for unreadable permission macro

Replace S_IRWXUGO with 0777 because symbolic permissions are considered
harmful:

 https://lwn.net/Articles/696229/

Link: http://lkml.kernel.org/r/1509367935-3086-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agonilfs2: align block comments of nilfs_sufile_truncate_range() at *
Ryusuke Konishi [Fri, 17 Nov 2017 23:29:43 +0000 (15:29 -0800)]
nilfs2: align block comments of nilfs_sufile_truncate_range() at *

Fix the following checkpatch warning:

 WARNING: Block comments should align the * on each line
 #633: FILE: sufile.c:633:
 +/**
 +  * nilfs_sufile_truncate_range - truncate range of segment array

Link: http://lkml.kernel.org/r/1509367935-3086-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agofs, nilfs: convert nilfs_root.count from atomic_t to refcount_t
Elena Reshetova [Fri, 17 Nov 2017 23:29:39 +0000 (15:29 -0800)]
fs, nilfs: convert nilfs_root.count from atomic_t to refcount_t

atomic_t variables are currently used to implement reference counters
with the following properties:

 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided refcount_t
type and API that prevents accidental counter overflows and underflows.
This is important since overflows and underflows can lead to
use-after-free situation and be exploitable.

The variable nilfs_root.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Link: http://lkml.kernel.org/r/1509367935-3086-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agonilfs2: fix race condition that causes file system corruption
Andreas Rohner [Fri, 17 Nov 2017 23:29:35 +0000 (15:29 -0800)]
nilfs2: fix race condition that causes file system corruption

There is a race condition between nilfs_dirty_inode() and
nilfs_set_file_dirty().

When a file is opened, nilfs_dirty_inode() is called to update the
access timestamp in the inode.  It calls __nilfs_mark_inode_dirty() in a
separate transaction.  __nilfs_mark_inode_dirty() caches the ifile
buffer_head in the i_bh field of the inode info structure and marks it
as dirty.

After some data was written to the file in another transaction, the
function nilfs_set_file_dirty() is called, which adds the inode to the
ns_dirty_files list.

Then the segment construction calls nilfs_segctor_collect_dirty_files(),
which goes through the ns_dirty_files list and checks the i_bh field.
If there is a cached buffer_head in i_bh it is not marked as dirty
again.

Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate
transactions, it is possible that a segment construction that writes out
the ifile occurs in-between the two.  If this happens the inode is not
on the ns_dirty_files list, but its ifile block is still marked as dirty
and written out.

In the next segment construction, the data for the file is written out
and nilfs_bmap_propagate() updates the b-tree.  Eventually the bmap root
is written into the i_bh block, which is not dirty, because it was
written out in another segment construction.

As a result the bmap update can be lost, which leads to file system
corruption.  Either the virtual block address points to an unallocated
DAT block, or the DAT entry will be reused for something different.

The error can remain undetected for a long time.  A typical error
message would be one of the "bad btree" errors or a warning that a DAT
entry could not be found.

This bug can be reproduced reliably by a simple benchmark that creates
and overwrites millions of 4k files.

Link: http://lkml.kernel.org/r/1509367935-3086-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agofs/nilfs2: convert timers to use timer_setup()
Kees Cook [Fri, 17 Nov 2017 23:29:32 +0000 (15:29 -0800)]
fs/nilfs2: convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer
to all timer callbacks, switch to using the new timer_setup() and
from_timer() to pass the timer pointer explicitly.  This requires adding
a pointer to hold the timer's target task, as the lifetime of sc_task
doesn't appear to match the timer's task.

Link: http://lkml.kernel.org/r/20171016235900.GA102729@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosysctl: check for UINT_MAX before unsigned int min/max
Joe Lawrence [Fri, 17 Nov 2017 23:29:28 +0000 (15:29 -0800)]
sysctl: check for UINT_MAX before unsigned int min/max

Mikulas noticed in the existing do_proc_douintvec_minmax_conv() and
do_proc_dopipe_max_size_conv() introduced in this patchset, that they
inconsistently handle overflow and min/max range inputs:

For example:

  0 ... param->min - 1 ---> ERANGE
  param->min ... param->max ---> the value is accepted
  param->max + 1 ... 0x100000000L + param->min - 1 ---> ERANGE
  0x100000000L + param->min ... 0x100000000L + param->max ---> EINVAL
  0x100000000L + param->max + 1, 0x200000000L + param->min - 1 ---> ERANGE
  0x200000000L + param->min ... 0x200000000L + param->max ---> EINVAL
  0x200000000L + param->max + 1, 0x300000000L + param->min - 1 ---> ERANGE

In do_proc_do*() routines which store values into unsigned int variables
(4 bytes wide for 64-bit builds), first validate that the input unsigned
long value (8 bytes wide for 64-bit builds) will fit inside the smaller
unsigned int variable.  Then check that the unsigned int value falls
inside the specified parameter min, max range.  Otherwise the unsigned
long -> unsigned int conversion drops leading bits from the input value,
leading to the inconsistent pattern Mikulas documented above.

Link: http://lkml.kernel.org/r/1507658689-11669-5-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agopipe: add proc_dopipe_max_size() to safely assign pipe_max_size
Joe Lawrence [Fri, 17 Nov 2017 23:29:24 +0000 (15:29 -0800)]
pipe: add proc_dopipe_max_size() to safely assign pipe_max_size

pipe_max_size is assigned directly via procfs sysctl:

  static struct ctl_table fs_table[] = {
          ...
          {
                  .procname       = "pipe-max-size",
                  .data           = &pipe_max_size,
                  .maxlen         = sizeof(int),
                  .mode           = 0644,
                  .proc_handler   = &pipe_proc_fn,
                  .extra1         = &pipe_min_size,
          },
          ...

  int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
                   size_t *lenp, loff_t *ppos)
  {
          ...
          ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
          ...

and then later rounded in-place a few statements later:

          ...
          pipe_max_size = round_pipe_size(pipe_max_size);
          ...

This leaves a window of time between initial assignment and rounding
that may be visible to other threads.  (For example, one thread sets a
non-rounded value to pipe_max_size while another reads its value.)

Similar reads of pipe_max_size are potentially racy:

  pipe.c :: alloc_pipe_info()
  pipe.c :: pipe_set_size()

Add a new proc_dopipe_max_size() that consolidates reading the new value
from the user buffer, verifying bounds, and calling round_pipe_size()
with a single assignment to pipe_max_size.

Link: http://lkml.kernel.org/r/1507658689-11669-4-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agopipe: avoid round_pipe_size() nr_pages overflow on 32-bit
Joe Lawrence [Fri, 17 Nov 2017 23:29:21 +0000 (15:29 -0800)]
pipe: avoid round_pipe_size() nr_pages overflow on 32-bit

round_pipe_size() contains a right-bit-shift expression which may
overflow, which would cause undefined results in a subsequent
roundup_pow_of_two() call.

  static inline unsigned int round_pipe_size(unsigned int size)
  {
          unsigned long nr_pages;

          nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
          return roundup_pow_of_two(nr_pages) << PAGE_SHIFT;
  }

PAGE_SIZE is defined as (1UL << PAGE_SHIFT), so:
  - 4 bytes wide on 32-bit (0 to 0xffffffff)
  - 8 bytes wide on 64-bit (0 to 0xffffffffffffffff)

That means that 32-bit round_pipe_size(), nr_pages may overflow to 0:

  size=0x00000000    nr_pages=0x0
  size=0x00000001    nr_pages=0x1
  size=0xfffff000    nr_pages=0xfffff
  size=0xfffff001    nr_pages=0x0         << !
  size=0xffffffff    nr_pages=0x0         << !

This is bad because roundup_pow_of_two(n) is undefined when n == 0!

64-bit is not a problem as the unsigned int size is 4 bytes wide
(similar to 32-bit) and the larger, 8 byte wide unsigned long, is
sufficient to handle the largest value of the bit shift expression:

  size=0xffffffff    nr_pages=100000

Modify round_pipe_size() to return 0 if n == 0 and updates its callers to
handle accordingly.

Link: http://lkml.kernel.org/r/1507658689-11669-3-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agopipe: match pipe_max_size data type with procfs
Joe Lawrence [Fri, 17 Nov 2017 23:29:17 +0000 (15:29 -0800)]
pipe: match pipe_max_size data type with procfs

Patch series "A few round_pipe_size() and pipe-max-size fixups", v3.

While backporting Michael's "pipe: fix limit handling" patchset to a
distro-kernel, Mikulas noticed that current upstream pipe limit handling
contains a few problems:

  1 - procfs signed wrap: echo'ing a large number into
      /proc/sys/fs/pipe-max-size and then cat'ing it back out shows a
      negative value.

  2 - round_pipe_size() nr_pages overflow on 32bit:  this would
      subsequently try roundup_pow_of_two(0), which is undefined.

  3 - visible non-rounded pipe-max-size value: there is no mutual
      exclusion or protection between the time pipe_max_size is assigned
      a raw value from proc_dointvec_minmax() and when it is rounded.

  4 - unsigned long -> unsigned int conversion makes for potential odd
      return errors from do_proc_douintvec_minmax_conv() and
      do_proc_dopipe_max_size_conv().

This version underwent the same testing as v1:
https://marc.info/?l=linux-kernel&m=150643571406022&w=2

This patch (of 4):

pipe_max_size is defined as an unsigned int:

  unsigned int pipe_max_size = 1048576;

but its procfs/sysctl representation is an integer:

  static struct ctl_table fs_table[] = {
          ...
          {
                  .procname       = "pipe-max-size",
                  .data           = &pipe_max_size,
                  .maxlen         = sizeof(int),
                  .mode           = 0644,
                  .proc_handler   = &pipe_proc_fn,
                  .extra1         = &pipe_min_size,
          },
          ...

that is signed:

  int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
                   size_t *lenp, loff_t *ppos)
  {
          ...
          ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)

This leads to signed results via procfs for large values of pipe_max_size:

  % echo 2147483647 >/proc/sys/fs/pipe-max-size
  % cat /proc/sys/fs/pipe-max-size
  -2147483648

Use unsigned operations on this variable to avoid such negative values.

Link: http://lkml.kernel.org/r/1507658689-11669-2-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoautofs: don't fail mount for transient error
NeilBrown [Fri, 17 Nov 2017 23:29:13 +0000 (15:29 -0800)]
autofs: don't fail mount for transient error

Currently if the autofs kernel module gets an error when writing to the
pipe which links to the daemon, then it marks the whole moutpoint as
catatonic, and it will stop working.

It is possible that the error is transient.  This can happen if the
daemon is slow and more than 16 requests queue up.  If a subsequent
process tries to queue a request, and is then signalled, the write to
the pipe will return -ERESTARTSYS and autofs will take that as total
failure.

So change the code to assess -ERESTARTSYS and -ENOMEM as transient
failures which only abort the current request, not the whole mountpoint.

It isn't a crash or a data corruption, but having autofs mountpoints
suddenly stop working is rather inconvenient.

Ian said:

: And given the problems with a half dozen (or so) user space applications
: consuming large amounts of CPU under heavy mount and umount activity this
: could happen more easily than we expect.

Link: http://lkml.kernel.org/r/87y3norvgp.fsf@notabene.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Ian Kent <raven@themaw.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoinit/version.c: include <linux/export.h> instead of <linux/module.h>
Masahiro Yamada [Fri, 17 Nov 2017 23:29:10 +0000 (15:29 -0800)]
init/version.c: include <linux/export.h> instead of <linux/module.h>

init/version.c has nothing to do with modules, so remove the
<linux/modude.h>.

Instead, include <linux/export.h> for EXPORT_SYMBOL_GPL.

This cuts off a lot of unnecessary header parsing.

Link: http://lkml.kernel.org/r/1505920984-8523-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoepoll: remove ep_call_nested() from ep_eventpoll_poll()
Jason Baron [Fri, 17 Nov 2017 23:29:06 +0000 (15:29 -0800)]
epoll: remove ep_call_nested() from ep_eventpoll_poll()

The use of ep_call_nested() in ep_eventpoll_poll(), which is the .poll
routine for an epoll fd, is used to prevent excessively deep epoll
nesting, and to prevent circular paths.

However, we are already preventing these conditions during
EPOLL_CTL_ADD.  In terms of too deep epoll chains, we do in fact allow
deep nesting of the epoll fds themselves (deeper than EP_MAX_NESTS),
however we don't allow more than EP_MAX_NESTS when an epoll file
descriptor is actually connected to a wakeup source.  Thus, we do not
require the use of ep_call_nested(), since ep_eventpoll_poll(), which is
called via ep_scan_ready_list() only continues nesting if there are
events available.

Since ep_call_nested() is implemented using a global lock, applications
that make use of nested epoll can see large performance improvements
with this change.

Davidlohr said:

: Improvements are quite obscene actually, such as for the following
: epoll_wait() benchmark with 2 level nesting on a 80 core IvyBridge:
:
: ncpus  vanilla     dirty     delta
: 1      2447092     3028315   +23.75%
: 4      231265      2986954   +1191.57%
: 8      121631      2898796   +2283.27%
: 16     59749       2902056   +4757.07%
: 32     26837      2326314   +8568.30%
: 64     12926       1341281   +10276.61%
:
: (http://linux-scalability.org/epoll/epoll-test.c)

Link: http://lkml.kernel.org/r/1509430214-5599-1-git-send-email-jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Salman Qazi <sqazi@google.com>
Cc: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoepoll: avoid calling ep_call_nested() from ep_poll_safewake()
Jason Baron [Fri, 17 Nov 2017 23:29:02 +0000 (15:29 -0800)]
epoll: avoid calling ep_call_nested() from ep_poll_safewake()

ep_poll_safewake() is used to wakeup potentially nested epoll file
descriptors.  The function uses ep_call_nested() to prevent entering the
same wake up queue more than once, and to prevent excessively deep
wakeup paths (deeper than EP_MAX_NESTS).  However, this is not necessary
since we are already preventing these conditions during EPOLL_CTL_ADD.
This saves extra function calls, and avoids taking a global lock during
the ep_call_nested() calls.

I have, however, left ep_call_nested() for the CONFIG_DEBUG_LOCK_ALLOC
case, since ep_call_nested() keeps track of the nesting level, and this
is required by the call to spin_lock_irqsave_nested().  It would be nice
to remove the ep_call_nested() calls for the CONFIG_DEBUG_LOCK_ALLOC
case as well, however its not clear how to simply pass the nesting level
through multiple wake_up() levels without more surgery.  In any case, I
don't think CONFIG_DEBUG_LOCK_ALLOC is generally used for production.
This patch, also apparently fixes a workload at Google that Salman Qazi
reported by completely removing the poll_safewake_ncalls->lock from
wakeup paths.

Link: http://lkml.kernel.org/r/1507920533-8812-1-git-send-email-jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Salman Qazi <sqazi@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoepoll: account epitem and eppoll_entry to kmemcg
Shakeel Butt [Fri, 17 Nov 2017 23:28:59 +0000 (15:28 -0800)]
epoll: account epitem and eppoll_entry to kmemcg

A userspace application can directly trigger the allocations from
eventpoll_epi and eventpoll_pwq slabs.  A buggy or malicious application
can consume a significant amount of system memory by triggering such
allocations.  Indeed we have seen in production where a buggy
application was leaking the epoll references and causing a burst of
eventpoll_epi and eventpoll_pwq slab allocations.  This patch opt-in the
charging of eventpoll_epi and eventpoll_pwq slabs.

There is a per-user limit (~4% of total memory if no highmem) on these
caches.  I think it is too generous particularly in the scenario where
jobs of multiple users are running on the system and the administrator
is reducing cost by overcomitting the memory.  This is unaccounted
kernel memory and will not be considered by the oom-killer.  I think by
accounting it to kmemcg, for systems with kmem accounting enabled, we
can provide better isolation between jobs of different users.

Link: http://lkml.kernel.org/r/20171003021519.23907-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agocheckpatch: do not check missing blank line before builtin_*_driver
Masahiro Yamada [Fri, 17 Nov 2017 23:28:55 +0000 (15:28 -0800)]
checkpatch: do not check missing blank line before builtin_*_driver

checkpatch.pl does not check missing blank line before module_*_driver.
I want it to behave likewise for builtin_*_driver.

Link: http://lkml.kernel.org/r/1505700081-12854-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>