openwrt/staging/blogic.git
12 years agoalpha: get rid of switch_stack argument of do_work_pending()
Al Viro [Thu, 11 Oct 2012 03:50:59 +0000 (23:50 -0400)]
alpha: get rid of switch_stack argument of do_work_pending()

... and now the asm glue side of that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoalpha: don't bother passing switch_stack separately from regs
Al Viro [Wed, 5 Sep 2012 22:53:18 +0000 (18:53 -0400)]
alpha: don't bother passing switch_stack separately from regs

It's needed only in setup_sigcontext() and it's always reg - <constant>;
no point passing it all way down through the call chain.  This is just
the signal.c side of that stuff; next will come the asm glue one...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoalpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c
Al Viro [Wed, 5 Sep 2012 22:30:34 +0000 (18:30 -0400)]
alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c

Turn the slow side of work_pending into C function, including all
the looping.  What we get out of that:
* we do _not_ call get_signal_to_deliver() with IRQs disabled
anymore
* no need to save/restore volatiles on each pass if there
turns to be more than one (unlikely, but still)
* all double-restart prevention is in C now.
* glue gets simpler.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoalpha: simplify TIF_NEED_RESCHED handling
Al Viro [Wed, 5 Sep 2012 22:08:40 +0000 (18:08 -0400)]
alpha: simplify TIF_NEED_RESCHED handling

In case we have both NEED_RESCHED and SIGPENDING/NOTIFY_RESUME,
handle the latter first.  We'll get to original priorities in
the next commit, but now that allows to simplify the treatment
of NEED_RESCHED-only case nicely.  Namely, now there no need to
preserve the data for restarts across the call of schedule() in
$work_resched; we can get there only if we had either returned
from syscall without SIGPENDING (in which case we should've
had no restart-worthy return value and want no restarts) or
already got through do_notify_resume() call (in which case we
want no restarts anymore).  So we can just slap 0 into $19
instead of preserving it (and $20).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoalpha: don't open-code trace_report_syscall_{enter,exit}
Al Viro [Sun, 27 May 2012 01:44:21 +0000 (21:44 -0400)]
alpha: don't open-code trace_report_syscall_{enter,exit}

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoUninclude linux/freezer.h
Richard Weinberger [Fri, 25 May 2012 23:57:10 +0000 (01:57 +0200)]
Uninclude linux/freezer.h

This include is no longer needed.
(seems to be a leftover from try_to_freeze())

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agom32r: trim masks
Al Viro [Fri, 1 Jun 2012 02:25:28 +0000 (22:25 -0400)]
m32r: trim masks

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoavr32: trim masks
Al Viro [Fri, 1 Jun 2012 02:24:57 +0000 (22:24 -0400)]
avr32: trim masks

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agotile: don't bother with SIGTRAP in setup_frame
Al Viro [Sun, 27 May 2012 02:22:53 +0000 (22:22 -0400)]
tile: don't bother with SIGTRAP in setup_frame

Tell signal_delivered() to do it instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agomicroblaze: don't bother with SIGTRAP in setup_rt_frame()
Al Viro [Mon, 17 Sep 2012 22:42:01 +0000 (18:42 -0400)]
microblaze: don't bother with SIGTRAP in setup_rt_frame()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agomn10300: don't bother with SIGTRAP in setup_frame()
Al Viro [Sun, 27 May 2012 02:20:22 +0000 (22:20 -0400)]
mn10300: don't bother with SIGTRAP in setup_frame()

... just tell signal_delivered() to do it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agofrv: no need to raise SIGTRAP in setup_frame()
Al Viro [Sun, 27 May 2012 02:17:47 +0000 (22:17 -0400)]
frv: no need to raise SIGTRAP in setup_frame()

signal_delivered() will do it in the same case...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agox86: get rid of duplicate code in case of CONFIG_VM86
Al Viro [Sat, 26 May 2012 05:07:39 +0000 (01:07 -0400)]
x86: get rid of duplicate code in case of CONFIG_VM86

no need to have the call of do_notify_resume() + checks around it
duplicated for vm86 case - a bit of rearranging of ifdefs and we'll
have a perfectly fine copy to jump back to.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agounicore32: remove pointless test
Al Viro [Sat, 26 May 2012 04:25:15 +0000 (00:25 -0400)]
unicore32: remove pointless test

we can get into work_pending only if at least one of NEED_RESCHED,
SIGPENDING or NOTIFY_RESUME is set.  So once we'd found no NEED_RESCHED,
there's no need to check that one of the other two is set.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoh8300: trim _TIF_WORK_MASK
Al Viro [Thu, 24 May 2012 07:12:25 +0000 (03:12 -0400)]
h8300: trim _TIF_WORK_MASK

Only the three usual flags (NEED_RESCHED/SIGPENDING/NOTIFY_RESUME)
are looked at in the code checking _TIF_WORK_MASK on that one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoparisc: decide whether to go to slow path (tracesys) based on thread flags
Al Viro [Sun, 20 May 2012 15:59:03 +0000 (11:59 -0400)]
parisc: decide whether to go to slow path (tracesys) based on thread flags

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoparisc: don't bother looping in do_signal()
Al Viro [Sat, 19 May 2012 05:13:01 +0000 (01:13 -0400)]
parisc: don't bother looping in do_signal()

entry.S code had been looping until no pending signals are left
since 2005 anyway; no need to bother with that in do_signal()
itself.  If the failure to set a sigframe up raises SIGSEGV,
we'll just pick it up the next time around the loop(s) in entry.S
anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoparisc: fix double restarts
Al Viro [Sat, 19 May 2012 04:29:22 +0000 (00:29 -0400)]
parisc: fix double restarts

Don't bother restoring r28 on syscall restarts; it's clobbered by
syscall anyway.  Reuse (now unused) ->orig_r28 as "no restarts allowed"
flag.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agobury the rest of TIF_IRET
Al Viro [Mon, 7 May 2012 21:37:26 +0000 (17:37 -0400)]
bury the rest of TIF_IRET

Some architectures had blindly copied it for no reason whatsoever.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agosanitize tsk_is_polling()
Al Viro [Fri, 1 Jun 2012 18:22:01 +0000 (14:22 -0400)]
sanitize tsk_is_polling()

Make default just return 0.  The current default (checking
TIF_POLLING_NRFLAG) is taken to architectures that need it;
ones that don't do polling in their idle threads don't need
to defined TIF_POLLING_NRFLAG at all.

ia64 defined both TS_POLLING (used by its tsk_is_polling())
and TIF_POLLING_NRFLAG (not used at all).  Killed the latter...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agobury _TIF_RESTORE_SIGMASK
Al Viro [Fri, 1 Jun 2012 18:21:16 +0000 (14:21 -0400)]
bury _TIF_RESTORE_SIGMASK

never used...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agounicore32: unobfuscate _TIF_WORK_MASK
Al Viro [Sat, 5 May 2012 22:29:41 +0000 (18:29 -0400)]
unicore32: unobfuscate _TIF_WORK_MASK

bits 3..7 in flags are never set there, so this 0xff is pointless

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agomips: NOTIFY_RESUME is not needed in TIF masks
Al Viro [Sat, 5 May 2012 20:24:40 +0000 (16:24 -0400)]
mips: NOTIFY_RESUME is not needed in TIF masks

If it's set, SIGPENDING is also set.  And SIGPENDING is present in
the masks...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agomips: merge the identical "return from syscall" per-ABI code
Al Viro [Sat, 5 May 2012 20:11:35 +0000 (16:11 -0400)]
mips: merge the identical "return from syscall" per-ABI code

No need to keep 4 copies of that stuff; merged and taken to
entry.S, unused public symbols there killed off.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agomips: unobfuscate _TIF..._MASK
Al Viro [Sat, 5 May 2012 19:57:00 +0000 (15:57 -0400)]
mips: unobfuscate _TIF..._MASK

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agomips: prevent hitting do_notify_resume() with !user_mode(regs)
Al Viro [Thu, 3 May 2012 01:45:12 +0000 (21:45 -0400)]
mips: prevent hitting do_notify_resume() with !user_mode(regs)

too late to do anything there...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoia64: can't reach do_signal() when returning to kernel mode
Al Viro [Tue, 1 May 2012 22:37:16 +0000 (18:37 -0400)]
ia64: can't reach do_signal() when returning to kernel mode

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoscore: fix bogus restarts on sigreturn()
Al Viro [Tue, 1 May 2012 03:37:41 +0000 (23:37 -0400)]
score: fix bogus restarts on sigreturn()

we *really* don't want to have restart logics hit when we are returning from
sigreturn() - random replacement of %r4 with -4 just because a signal had
been noticed from timer interrupt that came when %r4 happened to contain
-514 is not nice at all.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agomn10300: get rid of calling do_notify_resume() when returning to kernel mode
Al Viro [Mon, 30 Apr 2012 23:49:23 +0000 (19:49 -0400)]
mn10300: get rid of calling do_notify_resume() when returning to kernel mode

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoxtensa: can't get to do_notify_resume() when user_mode(regs) is not true
Al Viro [Sun, 29 Apr 2012 02:42:43 +0000 (22:42 -0400)]
xtensa: can't get to do_notify_resume() when user_mode(regs) is not true

asm glue checks that

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoc6x: switch to generic kernel_thread()
Al Viro [Sat, 22 Sep 2012 22:23:49 +0000 (18:23 -0400)]
c6x: switch to generic kernel_thread()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoc6x: switch to generic sys_execve
Mark Salter [Fri, 21 Sep 2012 16:26:39 +0000 (12:26 -0400)]
c6x: switch to generic sys_execve

Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoc6x: switch to generic kernel_execve
Mark Salter [Fri, 21 Sep 2012 16:26:38 +0000 (12:26 -0400)]
c6x: switch to generic kernel_execve

Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoc6x: add ret_from_kernel_thread(), simplify kernel_thread()
Mark Salter [Fri, 21 Sep 2012 16:26:37 +0000 (12:26 -0400)]
c6x: add ret_from_kernel_thread(), simplify kernel_thread()

Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agomn10300: convert to generic kernel_thread()
Al Viro [Sat, 22 Sep 2012 22:18:23 +0000 (18:18 -0400)]
mn10300: convert to generic kernel_thread()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agomn10300: switch to generic kernel_execve()
Al Viro [Wed, 19 Sep 2012 17:18:20 +0000 (13:18 -0400)]
mn10300: switch to generic kernel_execve()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agomn10300: switch to generic sys_execve()
Al Viro [Wed, 19 Sep 2012 17:08:13 +0000 (13:08 -0400)]
mn10300: switch to generic sys_execve()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agomn10300: split ret_from_fork, simplify kernel_thread()
Al Viro [Wed, 19 Sep 2012 17:05:49 +0000 (13:05 -0400)]
mn10300: split ret_from_fork, simplify kernel_thread()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agofrv: switch to generic kernel_thread()
Al Viro [Sat, 22 Sep 2012 22:10:15 +0000 (18:10 -0400)]
frv: switch to generic kernel_thread()

12 years agofrv: switch to generic kernel_execve
Al Viro [Wed, 19 Sep 2012 02:33:44 +0000 (22:33 -0400)]
frv: switch to generic kernel_execve

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agofrv: switch to generic sys_execve()
Al Viro [Wed, 19 Sep 2012 02:25:02 +0000 (22:25 -0400)]
frv: switch to generic sys_execve()

current_pt_regs() here is simply __frame

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agofrv: split ret_from_fork, simplify kernel_thread() a lot
Al Viro [Wed, 19 Sep 2012 02:18:51 +0000 (22:18 -0400)]
frv: split ret_from_fork, simplify kernel_thread() a lot

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agom68k: switch to generic sys_execve()/kernel_execve()
Al Viro [Sun, 16 Sep 2012 16:06:34 +0000 (12:06 -0400)]
m68k: switch to generic sys_execve()/kernel_execve()

The tricky part here is that task_pt_regs() on m68k works *only* for
process inside do_signal().  However, we need something much simpler -
pt_regs of a process inside do_signal() may be at different offsets
from the stack bottom, depending on the way we'd entered the kernel,
but for a task inside sys_execve() it *is* at constant offset.
Moreover, for a kernel thread about to become a userland process the
same location is also fine - setting sp to that will leave the kernel
stack pointer at the very bottom of the kernel stack when we finally
switch to userland.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agom68k: split ret_from_fork(), simplify kernel_thread()
Al Viro [Sun, 16 Sep 2012 16:05:09 +0000 (12:05 -0400)]
m68k: split ret_from_fork(), simplify kernel_thread()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agom68k: always set stack frame format for ColdFire on thread start
Greg Ungerer [Wed, 12 Sep 2012 09:13:19 +0000 (05:13 -0400)]
m68k: always set stack frame format for ColdFire on thread start

The stack frame "format" field needs to be explicitly set on thread creation
on ColdFire. For a normal long word aligned user stack pointer the frame
format is 0x4.

We were doing this for non-MMU ColdFire, but not for the case with MMU enabled.
So fix it so we always do it if targeting ColdFire.

The old code happend to rely on the stack frame format being inhereted from
the process calling exec. Furture changes means that may not always work,
so we really do want to set it explicitly.

Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agopowerpc: switch to generic sys_execve()/kernel_execve()
Al Viro [Fri, 31 Aug 2012 19:48:05 +0000 (15:48 -0400)]
powerpc: switch to generic sys_execve()/kernel_execve()

the only non-obvious part is that current_pt_regs() is really needed
here - task_pt_regs() is NULL for kernel threads; it's OK for ptrace
uses (the thing task_pt_regs() is intended for), but not for us.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agopowerpc: split ret_from_fork
Al Viro [Wed, 12 Sep 2012 22:32:42 +0000 (18:32 -0400)]
powerpc: split ret_from_fork

... and get rid of in-kernel syscalls in kernel_thread()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agos390: convert to generic kernel_execve()
Al Viro [Thu, 6 Sep 2012 21:08:47 +0000 (17:08 -0400)]
s390: convert to generic kernel_execve()

same situation as with alpha and arm - only massage needed

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agos390: switch to generic kernel_thread()
Al Viro [Sat, 22 Sep 2012 00:48:32 +0000 (20:48 -0400)]
s390: switch to generic kernel_thread()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agos390: fold kernel_thread_helper() into ret_from_fork()
Al Viro [Mon, 10 Sep 2012 22:03:41 +0000 (18:03 -0400)]
s390: fold kernel_thread_helper() into ret_from_fork()

... and don't bother with syscall return path in case of kernel
threads.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agos390: fold execve_tail() into start_thread(), convert to generic sys_execve()
Al Viro [Thu, 6 Sep 2012 19:48:11 +0000 (15:48 -0400)]
s390: fold execve_tail() into start_thread(), convert to generic sys_execve()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoum: switch to generic kernel_thread()
Al Viro [Sat, 22 Sep 2012 00:32:29 +0000 (20:32 -0400)]
um: switch to generic kernel_thread()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agox86, um/x86: switch to generic sys_execve and kernel_execve
Al Viro [Thu, 2 Aug 2012 19:05:11 +0000 (23:05 +0400)]
x86, um/x86: switch to generic sys_execve and kernel_execve

32bit wrapper is lost on that; 64bit one is *not*, since
we need to arrange for full pt_regs on stack when we call
sys_execve() and we need to load callee-saved ones from
there afterwards.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agox86: split ret_from_fork
Al Viro [Mon, 10 Sep 2012 20:44:54 +0000 (16:44 -0400)]
x86: split ret_from_fork

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoalpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
Al Viro [Fri, 1 Jun 2012 02:22:52 +0000 (22:22 -0400)]
alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoalpha: switch to generic kernel_thread()
Al Viro [Mon, 10 Sep 2012 02:03:42 +0000 (22:03 -0400)]
alpha: switch to generic kernel_thread()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoalpha: switch to generic sys_execve()
Al Viro [Thu, 2 Aug 2012 08:53:03 +0000 (12:53 +0400)]
alpha: switch to generic sys_execve()

get rid of sys_execve() wrapper, while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoarm: get rid of execve wrapper, switch to generic execve() implementation
Al Viro [Thu, 2 Aug 2012 07:52:41 +0000 (11:52 +0400)]
arm: get rid of execve wrapper, switch to generic execve() implementation

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoarm: optimized current_pt_regs()
Al Viro [Thu, 2 Aug 2012 07:49:43 +0000 (11:49 +0400)]
arm: optimized current_pt_regs()

... no need to read current_thread_info()->task only to
feed it to task_thread_page() immediately afterwards.
Moreover, not using current_thread_info() at all ends
up with better assembler - we need a location very close
to the top of kernel stack page and it's actually better
to do or with 0x1fff, followed be subtracting a small
constant than and with ~0x1fff, followed by adding a large
one.  Both & and | would be a couple of insns (mvn lsr/mvn lsl
for |, a pair of bic for &), but the following addition
would cost a pair of add while the subtraction ends up
as a single sub.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoarm: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
Al Viro [Thu, 2 Aug 2012 07:46:39 +0000 (11:46 +0400)]
arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoarm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk]
Al Viro [Mon, 10 Sep 2012 01:31:07 +0000 (21:31 -0400)]
arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agogeneric sys_execve()
Al Viro [Sun, 30 Sep 2012 17:38:55 +0000 (13:38 -0400)]
generic sys_execve()

Selected by __ARCH_WANT_SYS_EXECVE in unistd.h.  Requires
* working current_pt_regs()
* *NOT* doing a syscall-in-kernel kind of kernel_execve()
implementation.  Using generic kernel_execve() is fine.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agogeneric kernel_execve()
Al Viro [Sun, 30 Sep 2012 17:20:09 +0000 (13:20 -0400)]
generic kernel_execve()

based mostly on arm and alpha versions.  Architectures can define
__ARCH_WANT_KERNEL_EXECVE and use it, provided that
* they have working current_pt_regs(), even for kernel threads.
* kernel_thread-spawned threads do have space for pt_regs
in the normal location.  Normally that's as simple as switching to
generic kernel_thread() and making sure that kernel threads do *not*
go through return from syscall path; call the payload from equivalent
of ret_from_fork if we are in a kernel thread (or just have separate
ret_from_kernel_thread and make copy_thread() use it instead of
ret_from_fork in kernel thread case).
* they have ret_from_kernel_execve(); it is called after
successful do_execve() done by kernel_execve() and gets normal
pt_regs location passed to it as argument.  It's essentially
a longjmp() analog - it should set sp, etc. to the situation
expected at the return for syscall and go there.  Eventually
the need for that sucker will disappear, but that'll take some
surgery on kernel_thread() payloads.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agonew helper: current_pt_regs()
Al Viro [Sun, 30 Sep 2012 17:12:36 +0000 (13:12 -0400)]
new helper: current_pt_regs()

Normally (and that's the default) it's just task_pt_regs(current).
However, if an architecture can optimize that, it can do so by
making a macro of its own available from asm/ptrace.h.  More
importantly, some architectures have task_pt_regs() working only
for traced tasks blocked on signal delivery.  current_pt_regs()
needs to work for *all* processes, so before those architectures
start using stuff relying on current_pt_regs() they'll need a
properly working variant.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agopreparation for generic kernel_thread()
Al Viro [Fri, 21 Sep 2012 23:55:31 +0000 (19:55 -0400)]
preparation for generic kernel_thread()

Let architectures select GENERIC_KERNEL_THREAD and have their copy_thread()
treat NULL regs as "it came from kernel_thread(), sp argument contains
the function new thread will be calling and stack_size - the argument for
that function".  Switching the architectures begins shortly...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoum: kill thread->forking
Al Viro [Thu, 20 Sep 2012 13:28:25 +0000 (09:28 -0400)]
um: kill thread->forking

we only use that to tell copy_thread() done by syscall from that
done by kernel_thread().  However, it's easier to do simply by
checking PF_KTHREAD in thread flags.

Merge sys_clone() guts for 32bit and 64bit, while we are at it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoum: let signal_delivered() do SIGTRAP on singlestepping into handler
Al Viro [Thu, 6 Sep 2012 17:39:47 +0000 (13:39 -0400)]
um: let signal_delivered() do SIGTRAP on singlestepping into handler

... rather than duplicating that in sigframe setup code (and doing that
inconsistently, at that)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoum: don't leak floating point state and segment registers on execve()
Al Viro [Thu, 6 Sep 2012 03:20:33 +0000 (23:20 -0400)]
um: don't leak floating point state and segment registers on execve()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoum: take cleaning singlestep to start_thread()
Al Viro [Mon, 3 Sep 2012 07:24:18 +0000 (03:24 -0400)]
um: take cleaning singlestep to start_thread()

... assuming it's needed to be done at all

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agodon't bother exporting kernel_execve()
Al Viro [Thu, 2 Aug 2012 18:27:53 +0000 (22:27 +0400)]
don't bother exporting kernel_execve()

most of the architectures don't and there's not a single
caller outside of core kernel.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agothe only place that needs to include asm/exec.h is linux/binfmts.h
Al Viro [Fri, 3 Aug 2012 08:14:44 +0000 (12:14 +0400)]
the only place that needs to include asm/exec.h is linux/binfmts.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoget rid of generic instances of asm/exec.h
Al Viro [Fri, 3 Aug 2012 08:12:38 +0000 (12:12 +0400)]
get rid of generic instances of asm/exec.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agox86: get rid of TIF_IRET hackery
Al Viro [Thu, 2 Aug 2012 18:12:06 +0000 (22:12 +0400)]
x86: get rid of TIF_IRET hackery

TIF_NOTIFY_RESUME will work in precisely the same way; all that
is achieved by TIF_IRET is appearing that there's some work to be
done, so we end up on the iret exit path.  Just use NOTIFY_RESUME.
And for execve() do that in 32bit start_thread(), not sys_execve()
itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
12 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Wed, 19 Sep 2012 18:04:34 +0000 (11:04 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A small collection of driver fixes/updates and a core fix for 3.6.  It
  contains:

   - Bug fixes for mtip32xx, and support for new hardware (just addition
     of IDs).  They have been queued up for 3.7 for a few weeks as well.

   - rate-limit a failing command error message in block core.

   - A fix for an old cciss bug from Stephen.

   - Prevent overflow of partition count from Alan."

* 'for-linus' of git://git.kernel.dk/linux-block:
  cciss: fix handling of protocol error
  blk: add an upper sanity check on partition adding
  mtip32xx: fix user_buffer check in exec_drive_command
  mtip32xx: Remove dead code
  mtip32xx: Change printk to pr_xxxx
  mtip32xx: Proper reporting of write protect status on big-endian
  mtip32xx: Increase timeout for standby command
  mtip32xx: Handle NCQ commands during the security locked state
  mtip32xx: Add support for new devices
  block: rate-limit the error message from failing commands

12 years agoMerge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh
Linus Torvalds [Wed, 19 Sep 2012 18:03:55 +0000 (11:03 -0700)]
Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh

Pull SuperH fixes from Paul Mundt.

* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
  sh: Fix up TIF_NOTIFY_RESUME sans TIF_SIGPENDING handling.
  sh: pfc: Release spinlock in sh_pfc_gpio_request_enable() error path
  sh: intc: Fix up multi-evt irq association.

12 years agoMerge tag 'rpmsg-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg
Linus Torvalds [Wed, 19 Sep 2012 18:03:13 +0000 (11:03 -0700)]
Merge tag 'rpmsg-3.6-fix' of git://git./linux/kernel/git/ohad/rpmsg

Pull rpmsg fix from Ohad Ben-Cohen:
 "A quick rpmsg fix from Fernando, fixing two buggy invocations of
  dma_free_coherent"

* tag 'rpmsg-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg:
  rpmsg: fix dma_free_coherent dev parameter

12 years agoMerge tag 'md-3.6-fixes' of git://neil.brown.name/md
Linus Torvalds [Wed, 19 Sep 2012 18:01:38 +0000 (11:01 -0700)]
Merge tag 'md-3.6-fixes' of git://neil.brown.name/md

Pull md fixes from NeilBrown:
 "3 fixes for md in 3.6.

  One reverts a recent patch which turns out to not be such a good idea.

  Other two fix minor bugs with the new (since 3.3) 'replacement' code
  and have been tagged for -stable."

* tag 'md-3.6-fixes' of git://neil.brown.name/md:
  md: make sure metadata is updated when spares are activated or removed.
  md/raid5: fix calculate of 'degraded' when a replacement becomes active.
  Revert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE."

12 years agoMerge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Linus Torvalds [Wed, 19 Sep 2012 18:00:07 +0000 (11:00 -0700)]
Merge branch 'for-3.6-fixes' of git://git./linux/kernel/git/tj/wq

Pull workqueue / powernow-k8 fix from Tejun Heo:
 "This is the fix for the bug where cpufreq/powernow-k8 was tripping
  BUG_ON() in try_to_wake_up_local() by migrating workqueue worker to a
  different CPU.

    https://bugzilla.kernel.org/show_bug.cgi?id=47301

  As discussed, the fix is now two parts - one to reimplement
  work_on_cpu() so that it doesn't create a new kthread each time and
  the actual fix which makes powernow-k8 use work_on_cpu() instead of
  performing manual migration.

  While pretty late in the merge cycle, both changes are on the safer
  side.  Jiri and I verified two existing users of work_on_cpu() and
  Duncan confirmed that the powernow-k8 fix survived about 18 hours of
  testing."

* 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
  workqueue: reimplement work_on_cpu() using system_wq

12 years agocpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
Tejun Heo [Tue, 18 Sep 2012 21:24:59 +0000 (14:24 -0700)]
cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU

powernowk8_target() runs off a per-cpu work item and if the
cpufreq_policy->cpu is different from the current one, it migrates the
kworker to the target CPU by manipulating current->cpus_allowed.  The
function migrates the kworker back to the original CPU but this is
still broken.  Workqueue concurrency management requires the kworkers
to stay on the same CPU and powernowk8_target() ends up triggerring
BUG_ON(rq != this_rq()) in try_to_wake_up_local() if it contends on
fidvid_mutex and sleeps.

It is unclear why this bug is being reported now.  Duncan says it
appeared to be a regression of 3.6-rc1 and couldn't reproduce it on
3.5.  Bisection seemed to point to 63d95a91 "workqueue: use @pool
instead of @gcwq or @cpu where applicable" which is an non-functional
change.  Given that the reproduce case sometimes took upto days to
trigger, it's easy to be misled while bisecting.  Maybe something made
contention on fidvid_mutex more likely?  I don't know.

This patch fixes the bug by using work_on_cpu() instead if @pol->cpu
isn't the same as the current one.  The code assumes that
cpufreq_policy->cpu is kept online by the caller, which Rafael tells
me is the case.

stable: ed48ece27c ("workqueue: reimplement work_on_cpu() using
        system_wq") should be applied before this; otherwise, the
        behavior could be horrible.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Duncan <1i5t5.duncan@cox.net>
Tested-by: Duncan <1i5t5.duncan@cox.net>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: stable@vger.kernel.org
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47301

12 years agoworkqueue: reimplement work_on_cpu() using system_wq
Tejun Heo [Tue, 18 Sep 2012 19:48:43 +0000 (12:48 -0700)]
workqueue: reimplement work_on_cpu() using system_wq

The existing work_on_cpu() implementation is hugely inefficient.  It
creates a new kthread, execute that single function and then let the
kthread die on each invocation.

Now that system_wq can handle concurrent executions, there's no
advantage of doing this.  Reimplement work_on_cpu() using system_wq
which makes it simpler and way more efficient.

stable: While this isn't a fix in itself, it's needed to fix a
        workqueue related bug in cpufreq/powernow-k8.  AFAICS, this
        shouldn't break other existing users.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@vger.kernel.org
12 years agomd: make sure metadata is updated when spares are activated or removed.
NeilBrown [Wed, 19 Sep 2012 02:54:22 +0000 (12:54 +1000)]
md: make sure metadata is updated when spares are activated or removed.

It isn't always necessary to update the metadata when spares are
removed as the presence-or-not of a spare isn't really important to
the integrity of an array.
Also activating a spare doesn't always require updating the metadata
as the update on 'recovery-completed' is usually sufficient.

However the introduction of 'replacement' devices have made these
transitions sometimes more important.  For example the 'Replacement'
flag isn't cleared until the original device is removed, so we need
to ensure a metadata update after that 'spare' is removed.

So set MD_CHANGE_DEVS whenever a spare is activated or removed, to
complement the current situation where it is set when a spare is added
or a device is failed (or a number of other less common situations).

This is suitable for -stable as out-of-data metadata could lead
to data corruption.
This is only relevant for 3.3 and later 9when 'replacement' as
introduced.

Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
12 years agomd/raid5: fix calculate of 'degraded' when a replacement becomes active.
NeilBrown [Wed, 19 Sep 2012 02:52:30 +0000 (12:52 +1000)]
md/raid5: fix calculate of 'degraded' when a replacement becomes active.

When a replacement device becomes active, we mark the device that it
replaces as 'faulty' so that it can subsequently get removed.
However 'calc_degraded' only pays attention to the primary device, not
the replacement, so the array appears to become degraded, which is
wrong.

So teach 'calc_degraded' to consider any replacement if a primary
device is faulty.

This is suitable for -stable as an incorrect 'degraded' value can
confuse md and could lead to data corruption.
This is only relevant for 3.3 and later.

Cc: stable@vger.kernel.org
Reported-by: Robin Hill <robin@robinhill.me.uk>
Reported-by: John Drescher <drescherjm@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
12 years agoRevert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE."
NeilBrown [Wed, 19 Sep 2012 02:48:30 +0000 (12:48 +1000)]
Revert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE."

This reverts commit 895e3c5c58a80bb9e4e05d9ac38b4f30e0f97d80.

While this patch seemed like a good idea and did help some workloads,
it hurts other workloads.
Large sequential O_DIRECT writes were faster,
Small random O_DIRECT writes were slower.

Other changes (batching RAID5 writes) have improved the sequential
writes using a different mechanism, so the net result of this patch
is definitely negative.  So revert it.

Reported-by: Shaohua Li <shli@kernel.org>
Tested-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
12 years agoMerge tag 'hwspinlock-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad...
Linus Torvalds [Tue, 18 Sep 2012 18:58:54 +0000 (11:58 -0700)]
Merge tag 'hwspinlock-3.6-fix' of git://git./linux/kernel/git/ohad/hwspinlock

Pull hwspinlock fix from Ohad Ben-Cohen:
 "A single hwspinlock fix by Wei Yongjun, which prevents potential NULL
  dereferences"

* tag 'hwspinlock-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock:
  hwspinlock/core: move the dereference below the NULL test

12 years agovfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()
Miklos Szeredi [Mon, 17 Sep 2012 20:31:38 +0000 (22:31 +0200)]
vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()

IBM reported a soft lockup after applying the fix for the rename_lock
deadlock.  Commit c83ce989cb5f ("VFS: Fix the nfs sillyrename regression
in kernel 2.6.38") was found to be the culprit.

The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
dentry was killed.  This flag can be set on non-killed dentries too,
which results in infinite retries when trying to traverse the dentry
tree.

This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
only set in d_kill() and makes try_to_ascend() test only this flag.

IBM reported successful test results with this patch.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agocciss: fix handling of protocol error
Stephen M. Cameron [Fri, 14 Sep 2012 21:35:10 +0000 (16:35 -0500)]
cciss: fix handling of protocol error

If a command completes with a status of CMD_PROTOCOL_ERR, this
information should be conveyed to the SCSI mid layer, not dropped
on the floor.  Unlike a similar bug in the hpsa driver, this bug
only affects tape drives and CD and DVD ROM drives in the cciss
driver, and to induce it, you have to disconnect (or damage) a
cable, so it is not a very likely scenario (which would explain
why the bug has gone undetected for the last 10 years.)

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
12 years agoblk: add an upper sanity check on partition adding
Alan Cox [Mon, 17 Sep 2012 10:47:13 +0000 (11:47 +0100)]
blk: add an upper sanity check on partition adding

65536 should be ludicrous anyway but without it we overflow the
memory computation doing the allocation and badness occurs.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
12 years agosh: Fix up TIF_NOTIFY_RESUME sans TIF_SIGPENDING handling.
Al Viro [Tue, 18 Sep 2012 08:04:37 +0000 (17:04 +0900)]
sh: Fix up TIF_NOTIFY_RESUME sans TIF_SIGPENDING handling.

As Al notes, we missed a TIF_NOTIFY_RESUME check which caused any
handlers without TIF_SIGPENDING also set to skip the notification:

Looks like while it is in the relevant masks *and* checked in
do_notify_resume() both on 32bit and 64bit variants since commit
ab99c733ae73cce31f2a2434f7099564e5a73d95 ("sh: Make syscall tracer
use tracehook notifiers, add TIF_NOTIFY_RESUME.") they are
actually *not* reached without simulataneous SIGPENDING, since
the actual glue in the callers had not been updated back then and
still checks for _TIF_SIGPENDING alone when deciding whether to
hit do_notify_resume() or not.

Reported-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
12 years agosh: pfc: Release spinlock in sh_pfc_gpio_request_enable() error path
Laurent Pinchart [Fri, 14 Sep 2012 18:25:48 +0000 (20:25 +0200)]
sh: pfc: Release spinlock in sh_pfc_gpio_request_enable() error path

The sh_pfc_gpio_request_enable() function acquires a spinlock but fails
to release it before returning if the requested mux type is not
supported. Fix this.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
12 years agoMerge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Linus Torvalds [Mon, 17 Sep 2012 23:05:23 +0000 (16:05 -0700)]
Merge branch 'for-3.6-fixes' of git://git./linux/kernel/git/tj/wq

Pull another workqueue fix from Tejun Heo:
 "Unfortunately, yet another late fix.  This too is discovered and fixed
  by Lai.  This bug was introduced during this merge window by commit
  25511a477657 ("workqueue: reimplement CPU online rebinding to handle
  idle workers") which started using WORKER_REBIND flag for idle rebind
  too.

  The bug is relatively easy to trigger if the CPU rapidly goes through
  off, on and then off (and stay off).  The fix is on the safer side.
  This hasn't been on linux-next yet but I'm pushing early so that it
  can get more exposure before v3.6 release."

* 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()

12 years agoworkqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()
Lai Jiangshan [Mon, 17 Sep 2012 22:42:31 +0000 (15:42 -0700)]
workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()

busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed
(CPU is down again).  This used to be okay because the flag wasn't
used for anything else.

However, after 25511a477 "workqueue: reimplement CPU online rebinding
to handle idle workers", WORKER_REBIND is also used to command idle
workers to rebind.  If not cleared, the worker may confuse the next
CPU_UP cycle by having REBIND spuriously set or oops / get stuck by
prematurely calling idle_worker_rebind().

  WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5
 00()
  Hardware name: Bochs
  Modules linked in: test_wq(O-)
  Pid: 33, comm: kworker/1:1 Tainted: G           O 3.6.0-rc1-work+ #3
  Call Trace:
   [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0
   [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20
   [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500
   [<ffffffff810bc16e>] kthread+0xbe/0xd0
   [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
  ---[ end trace e977cf20f4661968 ]---
  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500
  PGD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: test_wq(O-)
  CPU 0
  Pid: 33, comm: kworker/1:1 Tainted: G        W  O 3.6.0-rc1-work+ #3 Bochs Bochs
  RIP: 0010:[<ffffffff810b3db0>]  [<ffffffff810b3db0>] worker_thread+0x360/0x500
  RSP: 0018:ffff88001e1c9de0  EFLAGS: 00010086
  RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
  RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001
  R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580
  R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900
  FS:  0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900)
  Stack:
   ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010
   ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900
   ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340
  Call Trace:
   [<ffffffff810bc16e>] kthread+0xbe/0xd0
   [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
  Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b
  RIP  [<ffffffff810b3db0>] worker_thread+0x360/0x500
   RSP <ffff88001e1c9de0>
  CR2: 0000000000000000

There was no reason to keep WORKER_REBIND on failure in the first
place - WORKER_UNBOUND is guaranteed to be set in such cases
preventing incorrectly activating concurrency management.  Always
clear WORKER_REBIND.

tj: Updated comment and description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
12 years agoMerge branch 'akpm' (Andrew's patch-bomb)
Linus Torvalds [Mon, 17 Sep 2012 22:01:14 +0000 (15:01 -0700)]
Merge branch 'akpm' (Andrew's patch-bomb)

Merge fixes from Andrew Morton:
 "13 patches.  12 are fixes and one is a little preparatory thing for
  Andi."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (13 commits)
  memory hotplug: fix section info double registration bug
  mm/page_alloc: fix the page address of higher page's buddy calculation
  drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe
  compiler.h: add __visible
  pid-namespace: limit value of ns_last_pid to (0, max_pid)
  include/net/sock.h: squelch compiler warning in sk_rmem_schedule()
  slub: consider pfmemalloc_match() in get_partial_node()
  slab: fix starting index for finding another object
  slab: do ClearSlabPfmemalloc() for all pages of slab
  nbd: clear waiting_queue on shutdown
  MAINTAINERS: fix TXT maintainer list and source repo path
  mm/ia64: fix a memory block size bug
  memory hotplug: reset pgdat->kswapd to NULL if creating kernel thread fails

12 years agomemory hotplug: fix section info double registration bug
qiuxishi [Mon, 17 Sep 2012 21:09:24 +0000 (14:09 -0700)]
memory hotplug: fix section info double registration bug

There may be a bug when registering section info.  For example, on my
Itanium platform, the pfn range of node0 includes the other nodes, so
other nodes' section info will be double registered, and memmap's page
count will equal to 3.

  node0: start_pfn=0x100,    spanned_pfn=0x20fb00, present_pfn=0x7f8a3, => 0x000100-0x20fc00
  node1: start_pfn=0x80000,  spanned_pfn=0x80000,  present_pfn=0x80000, => 0x080000-0x100000
  node2: start_pfn=0x100000, spanned_pfn=0x80000,  present_pfn=0x80000, => 0x100000-0x180000
  node3: start_pfn=0x180000, spanned_pfn=0x80000,  present_pfn=0x80000, => 0x180000-0x200000

  free_all_bootmem_node()
register_page_bootmem_info_node()
register_page_bootmem_info_section()

When hot remove memory, we can't free the memmap's page because
page_count() is 2 after put_page_bootmem().

  sparse_remove_one_section()
free_section_usemap()
free_map_bootmem()
put_page_bootmem()

[akpm@linux-foundation.org: add code comment]
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm/page_alloc: fix the page address of higher page's buddy calculation
Li Haifeng [Mon, 17 Sep 2012 21:09:21 +0000 (14:09 -0700)]
mm/page_alloc: fix the page address of higher page's buddy calculation

The heuristic method for buddy has been introduced since commit
43506fad21ca ("mm/page_alloc.c: simplify calculation of combined index
of adjacent buddy lists").  But the page address of higher page's buddy
was wrongly calculated, which will lead page_is_buddy to fail for ever.
IOW, the heuristic method would be disabled with the wrong page address
of higher page's buddy.

Calculating the page address of higher page's buddy should be based
higher_page with the offset between index of higher page and index of
higher page's buddy.

Signed-off-by: Haifeng Li <omycle@gmail.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: <stable@vger.kernel.org> [2.6.38+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agodrivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe
Kevin Hilman [Mon, 17 Sep 2012 21:09:17 +0000 (14:09 -0700)]
drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe

On some platforms, bootloaders are known to do some interesting RTC
programming.  Without going into the obscurities as to why this may be
the case, suffice it to say the the driver should not make any
assumptions about the state of the RTC when the driver loads.  In
particular, the driver probe should be sure that all interrupts are
disabled until otherwise programmed.

This was discovered when finding bursty I2C traffic every second on
Overo platforms.  This I2C overhead was keeping the SoC from hitting
deep power states.  The cause was found to be the RTC firing every
second on the I2C-connected TWL PMIC.

Special thanks to Felipe Balbi for suggesting to look for a rogue driver
as the source of the I2C traffic rather than the I2C driver itself.

Special thanks to Steve Sakoman for helping track down the source of the
continuous RTC interrups on the Overo boards.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Cc: Felipe Balbi <balbi@ti.com>
Tested-by: Steve Sakoman <steve@sakoman.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Tested-by: Shubhrajyoti Datta <omaplinuxkernel@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agocompiler.h: add __visible
Andi Kleen [Mon, 17 Sep 2012 21:09:15 +0000 (14:09 -0700)]
compiler.h: add __visible

gcc 4.6+ has support for a externally_visible attribute that prevents the
optimizer from optimizing unused symbols away.  Add a __visible macro to
use it with that compiler version or later.

This is used (at least) by the "Link Time Optimization" patchset.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agopid-namespace: limit value of ns_last_pid to (0, max_pid)
Andrew Vagin [Mon, 17 Sep 2012 21:09:12 +0000 (14:09 -0700)]
pid-namespace: limit value of ns_last_pid to (0, max_pid)

The kernel doesn't check the pid for negative values, so if you try to
write -2 to /proc/sys/kernel/ns_last_pid, you will get a kernel panic.

The crash happens because the next pid is -1, and alloc_pidmap() will
try to access to a nonexistent pidmap.

  map = &pid_ns->pidmap[pid/BITS_PER_PAGE];

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoinclude/net/sock.h: squelch compiler warning in sk_rmem_schedule()
Chuck Lever [Mon, 17 Sep 2012 21:09:11 +0000 (14:09 -0700)]
include/net/sock.h: squelch compiler warning in sk_rmem_schedule()

This warning:

  In file included from linux/include/linux/tcp.h:227:0,
                   from linux/include/linux/ipv6.h:221,
                   from linux/include/net/ipv6.h:16,
                   from linux/include/linux/sunrpc/clnt.h:26,
                   from linux/net/sunrpc/stats.c:22:
  linux/include/net/sock.h: In function `sk_rmem_schedule':
  linux/nfs-2.6/include/net/sock.h:1339:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

is seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) using the
-Wextra option.

Commit c76562b6709f ("netvm: prevent a stream-specific deadlock")
accidentally replaced the "size" parameter of sk_rmem_schedule() with an
unsigned int.  This changes the semantics of the comparison in the
return statement.

In sk_wmem_schedule we have syntactically the same comparison, but
"size" is a signed integer.  In addition, __sk_mem_schedule() takes a
signed integer for its "size" parameter, so there is an implicit type
conversion in sk_rmem_schedule() anyway.

Revert the "size" parameter back to a signed integer so that the
semantics of the expressions in both sk_[rw]mem_schedule() are exactly
the same.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoslub: consider pfmemalloc_match() in get_partial_node()
Joonsoo Kim [Mon, 17 Sep 2012 21:09:09 +0000 (14:09 -0700)]
slub: consider pfmemalloc_match() in get_partial_node()

get_partial() is currently not checking pfmemalloc_match() meaning that
it is possible for pfmemalloc pages to leak to non-pfmemalloc users.
This is a problem in the following situation.  Assume that there is a
request from normal allocation and there are no objects in the per-cpu
cache and no node-partial slab.

In this case, slab_alloc enters the slow path and new_slab_objects() is
called which may return a PFMEMALLOC page.  As the current user is not
allowed to access PFMEMALLOC page, deactivate_slab() is called
([5091b74a: mm: slub: optimise the SLUB fast path to avoid pfmemalloc
checks]) and returns an object from PFMEMALLOC page.

Next time, when we get another request from normal allocation,
slab_alloc() enters the slow-path and calls new_slab_objects().  In
new_slab_objects(), we call get_partial() and get a partial slab which
was just deactivated but is a pfmemalloc page.  We extract one object
from it and re-deactivate.

  "deactivate -> re-get in get_partial -> re-deactivate" occures repeatedly.

As a result, access to PFMEMALLOC page is not properly restricted and it
can cause a performance degradation due to frequent deactivation.
deactivation frequently.

This patch changes get_partial_node() to take pfmemalloc_match() into
account and prevents the "deactivate -> re-get in get_partial()
scenario.  Instead, new_slab() is called.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoslab: fix starting index for finding another object
Joonsoo Kim [Mon, 17 Sep 2012 21:09:06 +0000 (14:09 -0700)]
slab: fix starting index for finding another object

In array cache, there is a object at index 0, check it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>