Jeremy Fitzhardinge [Mon, 2 Feb 2009 21:55:54 +0000 (13:55 -0800)]
xen: use direct ops on 64-bit
Enable the use of the direct vcpu-access operations on 64-bit.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Jeremy Fitzhardinge [Mon, 2 Feb 2009 21:55:42 +0000 (13:55 -0800)]
xen: make direct versions of irq_enable/disable/save/restore to common code
Now that x86-64 has directly accessible percpu variables, it can also
implement the direct versions of these operations, which operate on a
vcpu_info structure directly embedded in the percpu area.
In fact, the 64-bit versions are more or less identical, and so can be
shared. The only two differences are:
1. xen_restore_fl_direct takes its argument in eax on 32-bit, and rdi on 64-bit.
Unfortunately it isn't possible to directly refer to the 2nd lsb of rdi directly
(as you can with %ah), so the code isn't quite as dense.
2. check_events needs to variants to save different registers.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Jeremy Fitzhardinge [Mon, 2 Feb 2009 21:55:31 +0000 (13:55 -0800)]
xen: setup percpu data pointers
We need to access percpu data fairly early, so set up the percpu
registers as soon as possible. We only need to load the appropriate
segment register. We already have a GDT, but its hard to change it
early because we need to manipulate the pagetable to do so, and that
hasn't been set up yet.
Also, set the kernel stack when bringing up secondary CPUs. If we
don't they all end up sharing the same stack...
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
H. Peter Anvin [Thu, 5 Feb 2009 00:58:26 +0000 (16:58 -0800)]
Merge branch 'core/percpu' into x86/paravirt
Jeremy Fitzhardinge [Mon, 2 Feb 2009 21:58:06 +0000 (13:58 -0800)]
xen: fix 32-bit build resulting from mmu move
Moving the mmu code from enlighten.c to mmu.c inadvertently broke the
32-bit build. Fix it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Jeremy Fitzhardinge [Wed, 4 Feb 2009 00:00:38 +0000 (16:00 -0800)]
x86/paravirt: return full 64-bit result
Impact: Bug fix
A hunk went missing in the original patch, and callee-save callsites were
not marked as returning the upper 32-bit of result, causing Badness.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Tue, 3 Feb 2009 02:16:19 +0000 (18:16 -0800)]
x86, percpu: fix kexec with vmlinux
Impact: fix regression with kexec with vmlinux
Split data.init into data.init, percpu, data.init2 sections
instead of let data.init wrap percpu secion.
Thus kexec loading will be happy, because sections will not
overlap.
Before the patch we have:
Elf file type is EXEC (Executable file)
Entry point 0x200000
There are 6 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff80200000 0x0000000000200000
0x0000000000ca6000 0x0000000000ca6000 R E 200000
LOAD 0x0000000000ea6000 0xffffffff80ea6000 0x0000000000ea6000
0x000000000014dfe0 0x000000000014dfe0 RWE 200000
LOAD 0x0000000001000000 0xffffffffff600000 0x0000000000ff4000
0x0000000000000888 0x0000000000000888 RWE 200000
LOAD 0x00000000011f6000 0xffffffff80ff6000 0x0000000000ff6000
0x0000000000073086 0x0000000000a2d938 RWE 200000
LOAD 0x0000000001400000 0x0000000000000000 0x000000000106a000
0x00000000001d2ce0 0x00000000001d2ce0 RWE 200000
NOTE 0x00000000009e2c1c 0xffffffff809e2c1c 0x00000000009e2c1c
0x0000000000000024 0x0000000000000024 4
Section to Segment mapping:
Segment Sections...
00 .text .notes __ex_table .rodata __bug_table .pci_fixup .builtin_fw __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
01 .data .init.rodata .data.cacheline_aligned .data.read_mostly
02 .vsyscall_0 .vsyscall_fn .vsyscall_gtod_data .vsyscall_1 .vsyscall_2 .vgetcpu_mode .jiffies
03 .data.init_task .smp_locks .init.text .init.data .init.setup .initcall.init .con_initcall.init .x86_cpu_dev.init .altinstructions .altinstr_replacement .exit.text .init.ramfs .bss
04 .data.percpu
05 .notes
After patch we've got:
Elf file type is EXEC (Executable file)
Entry point 0x200000
There are 7 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff80200000 0x0000000000200000
0x0000000000ca6000 0x0000000000ca6000 R E 200000
LOAD 0x0000000000ea6000 0xffffffff80ea6000 0x0000000000ea6000
0x000000000014dfe0 0x000000000014dfe0 RWE 200000
LOAD 0x0000000001000000 0xffffffffff600000 0x0000000000ff4000
0x0000000000000888 0x0000000000000888 RWE 200000
LOAD 0x00000000011f6000 0xffffffff80ff6000 0x0000000000ff6000
0x0000000000073086 0x0000000000073086 RWE 200000
LOAD 0x0000000001400000 0x0000000000000000 0x000000000106a000
0x00000000001d2ce0 0x00000000001d2ce0 RWE 200000
LOAD 0x000000000163d000 0xffffffff8123d000 0x000000000123d000
0x0000000000000000 0x00000000007e6938 RWE 200000
NOTE 0x00000000009e2c1c 0xffffffff809e2c1c 0x00000000009e2c1c
0x0000000000000024 0x0000000000000024 4
Section to Segment mapping:
Segment Sections...
00 .text .notes __ex_table .rodata __bug_table .pci_fixup .builtin_fw __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
01 .data .init.rodata .data.cacheline_aligned .data.read_mostly
02 .vsyscall_0 .vsyscall_fn .vsyscall_gtod_data .vsyscall_1 .vsyscall_2 .vgetcpu_mode .jiffies
03 .data.init_task .smp_locks .init.text .init.data .init.setup .initcall.init .con_initcall.init .x86_cpu_dev.init .altinstructions .altinstr_replacement .exit.text .init.ramfs
04 .data.percpu
05 .bss
06 .notes
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge [Sat, 31 Jan 2009 07:18:41 +0000 (23:18 -0800)]
x86/vmi: fix interrupt enable/disable/save/restore calling convention.
Zach says:
> Enable/Disable have no clobbers at all.
> Save clobbers only return value, %eax
> Restore also clobbers nothing.
This is precisely compatible with the calling convention, so we can
just call them directly without wrapping.
(Compile tested only.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Jeremy Fitzhardinge [Sat, 31 Jan 2009 07:17:23 +0000 (23:17 -0800)]
x86/paravirt: don't restore second return reg
Impact: bugfix
In the 32-bit calling convention, %eax:%edx is used to return 64-bit
values. Don't save and restore %edx around wrapped functions, or they
can't return a full 64-bit result.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Ingo Molnar [Sat, 31 Jan 2009 13:27:28 +0000 (14:27 +0100)]
Merge branch 'tj-percpu' of git://git./linux/kernel/git/tj/misc into core/percpu
Tejun Heo [Sat, 31 Jan 2009 05:36:00 +0000 (14:36 +0900)]
Merge branch 'master' into tj-percpu
Jeremy Fitzhardinge [Fri, 30 Jan 2009 08:47:54 +0000 (17:47 +0900)]
xen: setup percpu data pointers
Impact: fix xen booting
We need to access percpu data fairly early, so set up the percpu
registers as soon as possible. We only need to load the appropriate
segment register. We already have a GDT, but its hard to change it
early because we need to manipulate the pagetable to do so, and that
hasn't been set up yet.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Jeremy Fitzhardinge [Fri, 30 Jan 2009 08:47:54 +0000 (17:47 +0900)]
x86: split loading percpu segments from loading gdt
Impact: split out a function, no functional change
Xen needs to be able to access percpu data from very early on. For
various reasons, it cannot also load the gdt at that time. It does,
however, have a pefectly functional gdt at that point, so there's no
pressing need to reload the gdt.
Split the function to load the segment registers off, so Xen can call
it directly.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Brian Gerst [Fri, 30 Jan 2009 08:47:53 +0000 (17:47 +0900)]
x86: pass in cpu number to switch_to_new_gdt()
Impact: cleanup, prepare for xen boot fix.
Xen needs to call this function very early to setup the GDT and
per-cpu segments. Remove the call to smp_processor_id() and just
pass in the cpu number.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cliff Wickman [Thu, 29 Jan 2009 21:35:26 +0000 (15:35 -0600)]
x86: UV fix uv_flush_send_and_wait()
Impact: fix possible tlb mis-flushing on UV
uv_flush_send_and_wait() should return a pointer if the broadcast
remote tlb shootdown requests fail. That causes the conventional IPI
method of shootdown to be used.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Jeremy Fitzhardinge [Thu, 29 Jan 2009 09:51:34 +0000 (01:51 -0800)]
x86/paravirt: fix missing callee-save call on pud_val
Impact: Fix build when CONFIG_PARAVIRT_DEBUG is enabled
Fix missed convertion to using callee-saved calls for pud_val, which
causes a compile error when CONFIG_PARAVIRT_DEBUG is enabled.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Jeremy Fitzhardinge [Wed, 28 Jan 2009 22:35:07 +0000 (14:35 -0800)]
x86/paravirt: use callee-saved convention for pte_val/make_pte/etc
Impact: Optimization
In the native case, pte_val, make_pte, etc are all just identity
functions, so there's no need to clobber a lot of registers over them.
(This changes the 32-bit callee-save calling convention to return both
EAX and EDX so functions can return 64-bit values.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Jeremy Fitzhardinge [Wed, 28 Jan 2009 22:35:06 +0000 (14:35 -0800)]
x86/paravirt: implement PVOP_CALL macros for callee-save functions
Impact: Optimization
Functions with the callee save calling convention clobber many fewer
registers than the normal C calling convention. Implement variants of
PVOP_V?CALL* accordingly. This only bothers with functions up to 3
args, since functions with more args may as well use the normal
calling convention.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Jeremy Fitzhardinge [Wed, 28 Jan 2009 22:35:05 +0000 (14:35 -0800)]
x86/paravirt: add register-saving thunks to reduce caller register pressure
Impact: Optimization
One of the problems with inserting a pile of C calls where previously
there were none is that the register pressure is greatly increased.
The C calling convention says that the caller must expect a certain
set of registers may be trashed by the callee, and that the callee can
use those registers without restriction. This includes the function
argument registers, and several others.
This patch seeks to alleviate this pressure by introducing wrapper
thunks that will do the register saving/restoring, so that the
callsite doesn't need to worry about it, but the callee function can
be conventional compiler-generated code. In many cases (particularly
performance-sensitive cases) the callee will be in assembler anyway,
and need not use the compiler's calling convention.
Standard calling convention is:
arguments return scratch
x86-32 eax edx ecx eax ?
x86-64 rdi rsi rdx rcx rax r8 r9 r10 r11
The thunk preserves all argument and scratch registers. The return
register is not preserved, and is available as a scratch register for
unwrapped callee code (and of course the return value).
Wrapped function pointers are themselves wrapped in a struct
paravirt_callee_save structure, in order to get some warning from the
compiler when functions with mismatched calling conventions are used.
The most common paravirt ops, both statically and dynamically, are
interrupt enable/disable/save/restore, so handle them first. This is
particularly easy since their calls are handled specially anyway.
XXX Deal with VMI. What's their calling convention?
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Jeremy Fitzhardinge [Wed, 28 Jan 2009 22:35:04 +0000 (14:35 -0800)]
x86/paravirt: selectively save/restore regs around pvops calls
Impact: Optimization
Each asm paravirt-ops call says what registers are available for
clobbering. This patch makes use of this to selectively save/restore
registers around each pvops call. In many cases this significantly
shrinks code size.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Jeremy Fitzhardinge [Wed, 28 Jan 2009 22:35:03 +0000 (14:35 -0800)]
x86: fix paravirt clobber in entry_64.S
Impact: Fix latent bug
The clobber is trying to say that anything except RDI is available for
clobbering, but actually clobbers everything. This hasn't mattered
because the clobbers were basically ignored, but subsequent patches
will rely on them.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Jeremy Fitzhardinge [Wed, 28 Jan 2009 22:35:02 +0000 (14:35 -0800)]
x86/pvops: add a paravirt_ident functions to allow special patching
Impact: Optimization
Several paravirt ops implementations simply return their arguments,
the most obvious being the make_pte/pte_val class of operations on
native.
On 32-bit, the identity function is literally a no-op, as the calling
convention uses the same registers for the first argument and return.
On 64-bit, it can be implemented with a single "mov".
This patch adds special identity functions for 32 and 64 bit argument,
and machinery to recognize them and replace them with either nops or a
mov as appropriate.
At the moment, the only users for the identity functions are the
pagetable entry conversion functions.
The result is a measureable improvement on pagetable-heavy benchmarks
(2-3%, reducing the pvops overhead from 5 to 2%).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Jeremy Fitzhardinge [Wed, 28 Jan 2009 22:35:01 +0000 (14:35 -0800)]
xen: move remaining mmu-related stuff into mmu.c
Impact: Cleanup
Move remaining mmu-related stuff into mmu.c.
A general cleanup, and lay the groundwork for later patches.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
H. Peter Anvin [Fri, 30 Jan 2009 22:50:57 +0000 (14:50 -0800)]
Merge branch 'core/percpu' into x86/paravirt
Tejun Heo [Fri, 30 Jan 2009 07:32:22 +0000 (16:32 +0900)]
linker script: use separate simpler definition for PERCPU()
Impact: fix linker screwup on x86_32
Recent x86_64 zerobased patches introduced PERCPU_VADDR() to put
.data.percpu to a predefined address and re-defined PERCPU() in terms
of it. The new macro defined one extra symbol, __per_cpu_load, for
LMA of the section so that the init data could be accessed. This new
symbol introduced the following problems to x86_32.
1. If __per_cpu_load is defined outside of .data.percpu as an absolute
symbol, relocation generation for relocatable kernel fails due to
absolute relocation.
2. If __per_cpu_load is put inside .data.percpu with absolute address
assignment to work around #1, linker gets confused and under
certain configurations ends up relocating the symbol against
.data.percpu such that the load address gets added on top of
already set load address.
As x86_32 doesn't use predefined address for .data.percpu, there's no
need for it to care about the possibility of __per_cpu_load being
different from __per_cpu_start.
This patch defines PERCPU() separately so that __per_cpu_load is
defined inside .data.percpu so that everything is ordinary
linking-wise.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Fri, 30 Jan 2009 19:37:22 +0000 (11:37 -0800)]
Allow opportunistic merging of VM_CAN_NONLINEAR areas
Commit
de33c8db5910cda599899dd431cc30d7c1018cbf ("Fix OOPS in
mmap_region() when merging adjacent VM_LOCKED file segments") unified
the vma merging of anonymous and file maps to just one place, which
simplified the code and fixed a use-after-free bug that could cause an
oops.
But by doing the merge opportunistically before even having called
->mmap() on the file method, it now compares two different 'vm_flags'
values: the pre-mmap() value of the new not-yet-formed vma, and previous
mappings of the same file around it.
And in doing so, it refused to merge the common file case, which adds a
marker to say "I can be made non-linear".
This fixes it by just adding a set of flags that don't have to match,
because we know they are ok to merge. Currently it's only that single
VM_CAN_NONLINEAR flag, but at least conceptually there could be others
in the future.
Reported-and-acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 30 Jan 2009 17:23:30 +0000 (18:23 +0100)]
Merge branch 'linus' into core/percpu
Conflicts:
kernel/irq/handle.c
Linus Torvalds [Fri, 30 Jan 2009 16:54:29 +0000 (08:54 -0800)]
Merge branch 'for_linus' of git://git./linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Remove bogus BUG() check in ext4_bmap()
ext4: Fix building with EXT4FS_DEBUG
ext4: Initialize the new group descriptor when resizing the filesystem
ext4: Fix ext4_free_blocks() w/o a journal when files have indirect blocks
jbd2: On a __journal_expect() assertion failure printk "JBD2", not "EXT3-fs"
ext3: Add sanity check to make_indexed_dir
ext4: Add sanity check to make_indexed_dir
ext4: only use i_size_high for regular files
ext4: fix wrong use of do_div
Linus Torvalds [Fri, 30 Jan 2009 16:46:42 +0000 (08:46 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: Allow RT requests to pre-empt ongoing BE timeslice
block: add sysfs file for controlling io stats accounting
Mark mandatory elevator functions in the biodoc.txt
include/linux: Add bsg.h to the Kernel exported headers
block: silently error an unsupported barrier bio
block: Fix documentation for blkdev_issue_flush()
block: add bio_rw_flagged() for testing bio->bi_rw
block: seperate bio/request unplug and sync bits
block: export SSD/non-rotational queue flag through sysfs
Fix small typo in bio.h's documentation
block: get rid of the manual directory counting in blktrace
block: Allow empty integrity profile
block: Remove obsolete BUG_ON
block: Don't verify integrity metadata on read error
Linus Torvalds [Fri, 30 Jan 2009 16:41:36 +0000 (08:41 -0800)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
tulip: fix 21142 with 10Mbps without negotiation
drivers/net/skfp: if !capable(CAP_NET_ADMIN): inverted logic
gianfar: Fix Wake-on-LAN support
smsc911x: timeout reaches -1
smsc9420: fix interrupt signalling test failures
ucc_geth: Change uec phy id to the same format as gianfar's
wimax: fix build issue when debugfs is disabled
netxen: fix memory leak in drivers/net/netxen_nic_init.c
tun: Add some missing TUN compat ioctl translations.
ipv4: fix infinite retry loop in IP-Config
net: update documentation ip aliases
net: Fix OOPS in skb_seq_read().
net: Fix frag_list handling in skb_seq_read
netxen: revert jumbo ringsize
ath5k: fix locking in ath5k_config
cfg80211: print correct intersected regulatory domain
cfg80211: Fix sanity check on 5 GHz when processing country IE
iwlwifi: fix kernel oops when ucode DMA memory allocation failure
rtl8187: Fix error in setting OFDM power settings for RTL8187L
mac80211: remove Michael Wu as maintainer
...
Paul Larson [Fri, 30 Jan 2009 16:21:49 +0000 (10:21 -0600)]
Add enable_ms to jsm driver
This fixes a crash observed when non-existant enable_ms function is
called for jsm driver.
Signed-off-by: Scott Kilau <Scott.Kilau@digi.com>
Signed-off-by: Paul Larson <pl@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Divyesh Shah [Fri, 30 Jan 2009 11:46:41 +0000 (12:46 +0100)]
cfq-iosched: Allow RT requests to pre-empt ongoing BE timeslice
This patch adds the ability to pre-empt an ongoing BE timeslice when a RT
request is waiting for the current timeslice to complete. This reduces the
wait time to disk for RT requests from an upper bound of 4 (current value
of cfq_quantum) to 1 disk request.
Applied Jens' suggeested changes to avoid the rb lookup and use !cfq_class_rt()
and retested.
Latency(secs) for the RT task when doing sequential reads from 10G file.
| only RT | RT + BE | RT + BE + this patch
small (512 byte) reads | 143 | 163 | 145
large (1Mb) reads | 142 | 158 | 146
Signed-off-by: Divyesh Shah <dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Fri, 23 Jan 2009 09:54:44 +0000 (10:54 +0100)]
block: add sysfs file for controlling io stats accounting
This allows us to turn off disk stat accounting completely, for the cases
where the 0.5-1% reduction in system time is important.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Nikanth Karthikesan [Tue, 27 Jan 2009 08:29:24 +0000 (09:29 +0100)]
Mark mandatory elevator functions in the biodoc.txt
biodoc.txt mentions that elevator functions marked with * are mandatory, but
no function is marked with *. Mark the 3 functions which should be
implemented by any io scheduler.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Boaz Harrosh [Mon, 19 Jan 2009 09:37:38 +0000 (10:37 +0100)]
include/linux: Add bsg.h to the Kernel exported headers
bsg.h in current form is perfectly suitable for user-mode
consumption. It is needed together with scsi/sg.h for applications
that want to interface with the bsg driver.
Currently the few projects that use it would copy it over into
the projects. But that is not acceptable for projects that need
to provide source and devel packages for distros.
This should also be submitted to stable 2.6.28 and 2.6.27 since bsg had
a stable API since these Kernels and distro users will need the header
for these kernels a swell
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
CC: stable@kernel.org
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 13 Jan 2009 14:28:32 +0000 (15:28 +0100)]
block: silently error an unsupported barrier bio
This fixes a "regression" from 2.6.28, where the barrier probes that file
systems may do would trigger additional end request warnings in dmesg.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Theodore Ts'o [Tue, 13 Jan 2009 14:27:32 +0000 (15:27 +0100)]
block: Fix documentation for blkdev_issue_flush()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 6 Jan 2009 08:21:49 +0000 (09:21 +0100)]
block: add bio_rw_flagged() for testing bio->bi_rw
The existing functions for checking bio->bi_rw are badly named. So lets
mirror what we do for bio->bi_flags testing, use a properly named
function so that it's immediately obvious what is being tested.
Maintain compatability names for the old macros, eventually we'll get
rid of these.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 6 Jan 2009 08:16:05 +0000 (09:16 +0100)]
block: seperate bio/request unplug and sync bits
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Bartlomiej Zolnierkiewicz [Wed, 7 Jan 2009 11:22:39 +0000 (12:22 +0100)]
block: export SSD/non-rotational queue flag through sysfs
For some devices (i.e. CFA ATA) we can't reliably detect whether
the device is of rotational or non-rotational type so we need to
leave the final decision about this setting to the user-space.
As a bonus do a minor CodingStyle fixup in queue_nomerges_store().
Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Alberto Bertogli [Mon, 5 Jan 2009 09:18:53 +0000 (10:18 +0100)]
Fix small typo in bio.h's documentation
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Mon, 5 Jan 2009 09:17:25 +0000 (10:17 +0100)]
block: get rid of the manual directory counting in blktrace
It can result in a stuck blktrace system, if --kill is used.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Martin K. Petersen [Sun, 4 Jan 2009 07:43:40 +0000 (02:43 -0500)]
block: Allow empty integrity profile
Allow a block device to allocate and register an integrity profile
without providing a template. This allows DM to preallocate a profile
to avoid deadlocks during table reconfiguration.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Martin K. Petersen [Sun, 4 Jan 2009 07:43:39 +0000 (02:43 -0500)]
block: Remove obsolete BUG_ON
Now that bio_vecs are no longer cleared in bvec_alloc_bs() the following
BUG_ON must go.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Martin K. Petersen [Sun, 4 Jan 2009 07:43:38 +0000 (02:43 -0500)]
block: Don't verify integrity metadata on read error
If we get an I/O error on a read request there is no point in doing a
verify pass on the integrity buffer. Adjust the completion path
accordingly.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Theodore Ts'o [Fri, 30 Jan 2009 05:00:24 +0000 (00:00 -0500)]
ext4: Remove bogus BUG() check in ext4_bmap()
The code to support journal-less ext4 operation added a BUG to
ext4_bmap() which fired if there was no journal and the
EXT4_STATE_JDATA bit was set in the i_state field. This caused
running the filefrag program (which uses the FIMBAP ioctl) to trigger
a BUG().
The EXT4_STATE_JDATA bit is only used for ext4_bmap(), and it's
harmless for the bit to be set. We could add a check in
__ext4_journalled_writepage() and ext4_journalled_write_end() to only
set the EXT4_STATE_JDATA bit if the journal is present, but that adds
an extra test and jump instruction. It's easier to simply remove the
BUG check.
http://bugzilla.kernel.org/show_bug.cgi?id=12568
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Linus Torvalds [Fri, 30 Jan 2009 02:21:14 +0000 (18:21 -0800)]
Merge git://git./linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: make sure we allocate enough storage for socket address
[CIFS] Make socket retry timeouts consistent between blocking and nonblocking cases
[CIFS] some cleanup to dir.c prior to addition of posix_open
[CIFS] revalidate parent inode when rmdir done within that directory
[CIFS] Rename md5 functions to avoid collision with new rt modules
cifs: turn smb_send into a wrapper around smb_sendv
Alexander Beregalov [Wed, 28 Jan 2009 23:30:56 +0000 (02:30 +0300)]
sata_sil: Fix build breakage
Commit
e57db7b (SATA Sil: Blacklist system that spins off disks during ACPI power off)
breaks build like the following, in both cases when CONFIG_DMI set or not.
drivers/ata/sata_sil.c: In function 'sil_broken_system_poweroff':
drivers/ata/sata_sil.c:713: error: implicit declaration of function 'dmi_first_match'
drivers/ata/sata_sil.c:713: warning: initialization makes pointer from integer without a cast
sata_sil.c should include dmi.h
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bill Nottingham [Fri, 30 Jan 2009 00:28:40 +0000 (16:28 -0800)]
Documentation/Changes: add required versions for new filesystems
btrfs requires version 0.18 of its tools, and squashfs requires 4.0.
ext3 should use and ext4 requires v1.41.4 of e2fsprogs.
Signed-off-by: Bill Nottingham <notting@redhat.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Ted Tso <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Fri, 30 Jan 2009 00:28:28 +0000 (16:28 -0800)]
fix emacs indenting howto filename expansion
I don't think emacs understands tilde expansion, so use
"expand-file-name" to do that.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Teemu Likonen [Fri, 30 Jan 2009 00:28:16 +0000 (16:28 -0800)]
Documentation: update CodingStyle tips for Emacs users
With the previous Emacs tips example the kernel style was made available
for files in the kernel-tree only. This patch updates the tip to add a
separate cc-mode indent style ("linux-tabs-only"). This makes it easy to
switch between different indent styles and also makes the kernel style
easily available for any filetype mode (c++, awk, ...) that is managed
by the Emacs cc-mode.
Signed-off-by: Teemu Likonen <tlikonen@iki.fi>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Fri, 30 Jan 2009 00:28:02 +0000 (16:28 -0800)]
Documentation: move DMA-mapping.txt to Doc/PCI/
Move DMA-mapping.txt to Documentation/PCI/.
DMA-mapping.txt was supposed to be moved from Documentation/ to
Documentation/PCI/. The 00-INDEX files in those two directories
were updated, along with a few other text files, but the file
itself somehow escaped being moved, so move it and update more
text files and source files with its new location.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 30 Jan 2009 02:14:20 +0000 (18:14 -0800)]
Merge git://git./linux/kernel/git/gregkh/staging-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6:
Staging: poch: fix verification of memory area
Staging: usbip: usbip_start_threads(): handle kernel_thread failure
staging: agnx: drivers/staging/agnx/agnx.h needs <linux/io.h>
Staging: android: task_get_unused_fd_flags: fix the wrong usage of tsk->signal
Staging: android: Add lowmemorykiller documentation.
Staging: android: fix build error on 64bit boxes
Staging: android: timed_gpio: Fix build to build on kernels after 2.6.25.
Staging: android: binder: fix arm build errors
Staging: meilhaus: fix Kbuild
Staging: comedi: fix Kbuild
Linus Torvalds [Fri, 30 Jan 2009 02:14:05 +0000 (18:14 -0800)]
Merge git://git./linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
driver-core: fix kernel-doc parameter name
UIO: Add missing documentation of features added recently
Sync patch for jp_JP/stable_kernel_rules.txt
Linus Torvalds [Fri, 30 Jan 2009 02:13:22 +0000 (18:13 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ASoC: OMAP: Initialize XCCR and RCCR registers in McBSP DAI driver
ASoC: Fix null string usage with WM8753 DAIs
ALSA: hda - add another MacBook Pro 4, 1 subsystem ID
ALSA: hda - Fix compile warning with CONFIG_SND_JACK=n
ALSA: hda - Add quirk for HP DV6700 laptop
ALSA: hda - Fix PCM reference NID for STAC/IDT analog outputs
Linus Torvalds [Fri, 30 Jan 2009 02:12:58 +0000 (18:12 -0800)]
Merge branch 'linux-next' of git://git.infradead.org/ubi-2.6
* 'linux-next' of git://git.infradead.org/ubi-2.6:
UBI: allow direct user-space I/O
UBI: fix resource de-allocation
UBI: remove unused variable
UBI: use nicer 64-bit math
UBI: add ioctl compatibility
UBI: constify file operations
UBI: allow all ioctls
UBI: remove unnecessry header inclusion
UBI: improve ioctl commentaries
UBI: add ioctl for is_mapped operation
UBI: add ioctl for unmap operation
UBI: add ioctl for map operation
Linus Torvalds [Fri, 30 Jan 2009 02:11:02 +0000 (18:11 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: document difference between hid_blacklist and hid_ignore_list
HID: add antec-branded soundgraph imon devices to blacklist
HID: fix reversed logic in disconnect testing of hiddev
HID: adjust report descriptor fixup for MS 1028 receiver
Linus Torvalds [Fri, 30 Jan 2009 02:10:36 +0000 (18:10 -0800)]
Merge git://git./linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
lguest: Fix a memory leak with the lg object during launcher close
lguest: disable the FORTIFY for lguest.
lguest: typos fix
Linus Torvalds [Fri, 30 Jan 2009 02:09:41 +0000 (18:09 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
ieee1394: sbp2: add workarounds for 2nd and 3rd generation iPods
firewire: sbp2: add workarounds for 2nd and 3rd generation iPods
firewire: sbp2: fix DMA mapping leak on the failure path
firewire: sbp2: define some magic numbers as macros
firewire: sbp2: fix payload limit at S1600 and S3200
ieee1394: sbp2: don't assume zero model_id or firmware_revision if there is none
ieee1394: sbp2: fix payload limit at S1600 and S3200
ieee1394: sbp2: update a help string
ieee1394: support for speeds greater than S800
firewire: core: optimize card shutdown
ieee1394: ohci1394: increase AT req. retries, fix ack_busy_X from Panasonic camcorders and others
firewire: ohci: increase AT req. retries, fix ack_busy_X from Panasonic camcorders and others
firewire: ohci: change "context_stop: still active" log message
firewire: keep highlevel drivers attached during brief connection loss
firewire: unnecessary BM delay after generation rollover
firewire: insist on successive self ID complete events
Andrew Morton [Thu, 29 Jan 2009 22:25:27 +0000 (14:25 -0800)]
drivers/gpu/drm/i915/intel_lvds.c: fix locking snafu
s/unlock/lock/
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12575
Reported-by: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@linux.ie>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davide Libenzi [Thu, 29 Jan 2009 22:25:26 +0000 (14:25 -0800)]
epoll: drop max_user_instances and rely only on max_user_watches
Linus suggested to put limits where the money is, and max_user_watches
already does that w/out the need of max_user_instances. That has the
advantage to mitigate the potential DoS while allowing pretty generous
default behavior.
Allowing top 4% of low memory (per user) to be allocated in epoll watches,
we have:
LOMEM MAX_WATCHES (per user)
512MB ~178000
1GB ~356000
2GB ~712000
A box with 512MB of lomem, will meet some challenge in hitting 180K
watches, socket buffers math teaches us. No more max_user_instances
limits then.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Bron Gondwana <brong@fastmail.fm>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frans Pop [Thu, 29 Jan 2009 22:25:25 +0000 (14:25 -0800)]
hp-wmi: set initial docking state
If the initial state is not set when the input device is set up, the first
docking event after the module is loaded will be lost.
Signed-off-by: Frans Pop <elendil@planet.nl>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bharath Ramesh [Thu, 29 Jan 2009 22:25:24 +0000 (14:25 -0800)]
hwmon: applesmc: add support for MacPro 3 temperature sensors
MacPro 3 have more temperature sensors than the previous MacPro's also the
sensor THTG has been removed. This patch add supports for the newer
temperature sensors in the MacPro3.
Signed-off-by: Bharath Ramesh <bramesh@vt.edu>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Altobelli [Thu, 29 Jan 2009 22:25:23 +0000 (14:25 -0800)]
hpilo: increment version
Bump hpilo module version to indicate that the open/close bug is fixed.
Signed-off-by: David Altobelli <david.altobelli@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Menage [Thu, 29 Jan 2009 22:25:22 +0000 (14:25 -0800)]
cgroup: fix root_count when mount fails due to busy subsystem
root_count was being incremented in cgroup_get_sb() after all error
checking was complete, but decremented in cgroup_kill_sb(), which can be
called on a superblock that we gave up on due to an error. This patch
changes cgroup_kill_sb() to only decrement root_count if the root was
previously linked into the list of roots.
Signed-off-by: Paul Menage <menage@google.com>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Menage [Thu, 29 Jan 2009 22:25:21 +0000 (14:25 -0800)]
cgroups: add cpu_relax() calls in css_tryget() and cgroup_clear_css_refs()
css_tryget() and cgroup_clear_css_refs() contain polling loops; these
loops should have cpu_relax calls in them to reduce cross-cache traffic.
Signed-off-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 29 Jan 2009 22:25:21 +0000 (14:25 -0800)]
cgroups: fix lock inconsistency in cgroup_clone()
I fixed a bug in cgroup_clone() in Linus' tree in commit
7b574b7
("cgroups: fix a race between cgroup_clone and umount") without noticing
there was a cleanup patch in -mm tree that should be rebased (now commit
104cbd5, "cgroups: use task_lock() for access tsk->cgroups safe in
cgroup_clone()"), thus resulted in lock inconsistency.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ivan Kokshaysky [Thu, 29 Jan 2009 22:25:20 +0000 (14:25 -0800)]
alpha: fix the BUG() macro
The commit "alpha: teach the compiler that BUG doesn't return"
(
ed6b9b97f42c091630335bfb71a2931e6f86388b) moved the asm code into inline
function which takes __FILE__ and __LINE__ as arguments. This violates
asm constrains there ("i" - an immediate operand with constant value), so
that compile may result in warning or error, depending on compiler
version.
Just adding an infinite loop to the BUG() is sufficient.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ivan Kokshaysky [Thu, 29 Jan 2009 22:25:19 +0000 (14:25 -0800)]
alpha: compile fixes
- jensen build: fix conflicting declarations for pci_alloc_consistent()
and undefined virt_to_phys();
- SMP: arch/alpha/kernel/smp.c:124: warning: passing argument 2
of '__cpu_test_and_set' discards qualifiers from pointer target type
Interestingly, this only happens with gcc-4.2; gcc <= 4.1 and gcc-4.3
are OK. Fixed with extra assignment.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ivan Kokshaysky [Thu, 29 Jan 2009 22:25:18 +0000 (14:25 -0800)]
alpha: use syscall wrappers
Convert OSF syscalls and add alpha specific SYSCALL_ALIAS() macro.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 29 Jan 2009 22:25:17 +0000 (14:25 -0800)]
memcg: NULL pointer dereference at rmdir on some NUMA systems
N_POSSIBLE doesn't means there is memory...and force_empty can
visit invalid node which have no pgdat.
To visit all valid nodes, N_HIGH_MEMORY should be used.
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Buell [Thu, 29 Jan 2009 22:25:16 +0000 (14:25 -0800)]
fbdev: incorrect URL given in drivers/video/Kconfig
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frans Pop [Thu, 29 Jan 2009 22:25:14 +0000 (14:25 -0800)]
hp-wmi: fix regressions caused by missing if statement
Error was introduced in commit
fe8e4e039dc3 ("hp-wmi: handle
rfkill_register() failure").
Signed-off-by: Frans Pop <elendil@planet.nl>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Matthew Garrett <mjg@redhat.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 29 Jan 2009 22:25:14 +0000 (14:25 -0800)]
memcg: update document to mention that swapoff should be tested
Considering the recently found problem "memcg: fix refcnt handling at
swapoff", it's better to mention swapoff behavior in the memcg_test
document.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 29 Jan 2009 22:25:13 +0000 (14:25 -0800)]
memcg: fix refcnt handling at swapoff
Now, at swapoff, even while try_charge() fails, commit is executed. This
is a bug which turns the refcnt of cgroup_subsys_state negative.
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Magnus Damm [Thu, 29 Jan 2009 22:25:12 +0000 (14:25 -0800)]
gpiolib: fix request related issue
Fix request-already-requested handling in gpio_request().
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daisuke Nishimura [Thu, 29 Jan 2009 22:25:11 +0000 (14:25 -0800)]
memcg: get/put parents at create/free
The lifetime of struct cgroup and struct mem_cgroup is different and
mem_cgroup has its own reference count for handling references from
swap_cgroup.
This causes strange problem that the parent mem_cgroup dies while child
mem_cgroup alive, and this problem causes a bug in case of
use_hierarchy==1 because res_counter_uncharge climbs up the tree.
This patch is for avoiding it by getting the parent at create, and putting
it at freeing.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by; KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 29 Jan 2009 22:25:10 +0000 (14:25 -0800)]
cgroups: use hierarchy mutex in creation failure path
Now, cgrp->sibling is handled under hierarchy mutex.
error route should do so, too.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Evgeniy Polyakov [Thu, 29 Jan 2009 22:25:09 +0000 (14:25 -0800)]
mm: OOM documentation update
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masami Hiramatsu [Thu, 29 Jan 2009 22:25:08 +0000 (14:25 -0800)]
kprobes: fix module compilation error with CONFIG_KPROBES=n
Define kprobes related data structures even if CONFIG_KPROBES is not set.
This fixes compilation errors which occur if CONFIG_KPROBES is not set, in
kprobe using modules.
[akpm@linux-foundation.org: fix build for non-kprobes-supporting architectures]
Reviewed-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Thu, 29 Jan 2009 22:25:07 +0000 (14:25 -0800)]
sgi-xpc: fix up stale DBUG_ON statements
Clean up the stale DBUG_ON checks and add a couple new ones.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Thu, 29 Jan 2009 22:25:07 +0000 (14:25 -0800)]
sgi-xpc: Remove NULL pointer dereference.
If the bte copy fails, the attempt to retrieve payloads merely returns a
null pointer deref and not NULL as was expected.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Dean Nelson <dcn@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Thu, 29 Jan 2009 22:25:06 +0000 (14:25 -0800)]
sgi-xpc: ensure flags are updated before bte_copy
The clearing of the msg->flags needs a barrier between it and the notify
of the channel threads that the messages are cleaned and ready for use.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Dean Nelson <dcn@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 30 Jan 2009 01:46:42 +0000 (17:46 -0800)]
Fix OOPS in mmap_region() when merging adjacent VM_LOCKED file segments
As of commit
ba470de43188cdbff795b5da43a1474523c6c2fb ("map: handle
mlocked pages during map, remap, unmap") we now use the 'vma' variable
at the end of mmap_region() to handle the page-in of newly mapped
mlocked pages.
However, if we merged adjacent vma's together, the vma we're using may
be stale. We historically consciously avoided using it after the merge
operation, but that got overlooked when redoing the locked page
handling.
This commit simplifies mmap_region() by doing any vma merges early,
avoiding the issue entirely, and 'vma' will always be valid. As pointed
out by Hugh Dickins, this depends on any drivers that change the page
offset of flags to have set one of the VM_SPECIAL bits (so that they
cannot trigger the early merge logic), but that's true in general.
Reported-and-tested-by: Maksim Yevmenkin <maksim.yevmenkin@gmail.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Philippe De Muyter [Fri, 30 Jan 2009 01:35:04 +0000 (17:35 -0800)]
tulip: fix 21142 with 10Mbps without negotiation
with current kernels, tulip 21142 ethernet controllers fail to connect
to a 10Mbps only (i.e. without negotiation-partner) network. It used
to work in 2.4 kernels. Fix that. Tested on a 21142 Rev 0x11.
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roel Kluin [Fri, 30 Jan 2009 01:32:20 +0000 (17:32 -0800)]
drivers/net/skfp: if !capable(CAP_NET_ADMIN): inverted logic
Fix inverted logic
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anton Vorontsov [Fri, 30 Jan 2009 01:31:13 +0000 (17:31 -0800)]
gianfar: Fix Wake-on-LAN support
commit
0f0ca340e57bd7446855fefd07a64249acf81223 ("phy: power
management support") caused a regression in the gianfar driver.
Now phylib turns off PHY power during suspend, and thus WOL
doesn't work anymore.
This patch workarounds the issue by enabling wakeup in the MDIO
device, i.e. just restores the old behaviour for the gianfar
driver. Note that this way all PHYs on a given MDIO bus won't
be turned off during suspend, which isn't good from the power
saving point of view.
A proper, per netdevice wakeup management support will need
a bit reworked phylib suspend/resume logic.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roel Kluin [Fri, 30 Jan 2009 01:30:00 +0000 (17:30 -0800)]
smsc911x: timeout reaches -1
With a postfix decrement the timeout will reach -1 rather than 0,
so the warning will not be issued.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steve Glendinning [Fri, 30 Jan 2009 01:29:15 +0000 (17:29 -0800)]
smsc9420: fix interrupt signalling test failures
smsc9420 performs an interrupt signalling test when the interface is
brought up. The current code mistakenly sets its test flag to false
AFTER enabling the software interrupt source, making failure quite
likely.
This patch changes the code to set the test flag BEFORE enabling
interrupts. I've also removed an smp_wmb because the following spinlock
provides an implicit memory barrier.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiying Wang [Fri, 30 Jan 2009 01:28:04 +0000 (17:28 -0800)]
ucc_geth: Change uec phy id to the same format as gianfar's
The commit
b31a1d8b41513b96e9c7ec2f68c5734cef0b26a4 ("gianfar: Convert
gianfar to an of_platform_driver") changes the gianfar's phy id to the
format like "mdio@xxxx:xx", but uec still uses the old format like
"xxxxxxxx:xx". For the board whose UEC uses gianfar-mdio like
MPC8568MDS, the phy can not be attached because of the incompatible
phy id format. This patch changes uec's phy id to the same format as
gianfar's.
Signed-off-by: Haiying Wang <Haiying.Wang@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inaky Perez-Gonzalez [Fri, 30 Jan 2009 01:18:31 +0000 (17:18 -0800)]
wimax: fix build issue when debugfs is disabled
As reported by Toralf Förster and Randy Dunlap.
- http://linuxwimax.org/pipermail/wimax/2009-January/000460.html
- http://lkml.org/lkml/2009/1/29/279
The definitions needed for the wimax stack and i2400m driver debug
infrastructure was, by mistake, compiled depending on CONFIG_DEBUG_FS
(by them being placed in the debugfs.c files); thus the build broke in
2.6.29-rc3 when debugging was enabled (CONFIG_WIMAX_DEBUG) and
DEBUG_FS was disabled.
These definitions are always needed if debug is enabled at compile
time (independently of DEBUG_FS being or not enabled), so moving them
to a file that is always compiled fixes the issue.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Wallis [Mon, 26 Jan 2009 06:32:35 +0000 (17:32 +1100)]
lguest: Fix a memory leak with the lg object during launcher close
Fix a memory leak identified by Rusty Russell during LCA09 by
kfree'ing the lg object instead of just clearing it when the
launcher closes.
Signed-off-by: Mark Wallis <mwallis@serialmonkey.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tim 'mithro' Ansell [Thu, 22 Jan 2009 04:06:41 +0000 (15:06 +1100)]
lguest: disable the FORTIFY for lguest.
Makes all the warnings go away when compiling lguest on Ubuntu on
Intrepid or greater.
Signed-off-by: Timothy R Ansell <mithro@mithis.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Atsushi SAKAI [Fri, 16 Jan 2009 11:39:14 +0000 (20:39 +0900)]
lguest: typos fix
3 points
lguest_asm.S => i386_head.S
LHCALL_BREAK => LHREQ_BREAK
perferred => preferred
Signed-off-by: Atsushi SAKAI <sakaia@jp.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Daniel Marjamäki [Thu, 29 Jan 2009 08:55:56 +0000 (08:55 +0000)]
netxen: fix memory leak in drivers/net/netxen_nic_init.c
For kernel bugzilla #12537:
http://bugzilla.kernel.org/show_bug.cgi?id=12537
Free memory.
Signed-off-by: Daniel Marjamäki <danielm77@spray.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 30 Jan 2009 00:53:35 +0000 (16:53 -0800)]
tun: Add some missing TUN compat ioctl translations.
Based upon a report from Michael Tokarev <mjt@tls.msk.ru>:
Just saw in dmesg:
ioctl32(kvm:4408): Unknown cmd fd(9) cmd(
800454cf){t:'T';sz:4} arg(
ffc668e4) on /dev/net/tun
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Zores [Fri, 30 Jan 2009 00:19:13 +0000 (16:19 -0800)]
ipv4: fix infinite retry loop in IP-Config
Signed-off-by: Benjamin Zores <benjamin.zores@alcatel-lucent.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Fri, 30 Jan 2009 00:16:31 +0000 (16:16 -0800)]
net: update documentation ip aliases
This documentation is old. Add a short note to describe why aliases
are no long necessary, and remove the old contact/edit info.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shyam Iyer [Fri, 30 Jan 2009 00:12:42 +0000 (16:12 -0800)]
net: Fix OOPS in skb_seq_read().
It oopsd for me in skb_seq_read. addr2line said it was
linux-2.6/net/core/skbuff.c:2228, which is this line:
while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) {
I added some printks in there and it looks like we hit this:
} else if (st->root_skb == st->cur_skb &&
skb_shinfo(st->root_skb)->frag_list) {
st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
st->frag_idx = 0;
goto next_skb;
}
Actually I did some testing and added a few printks and found that the
st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null.
This caused the kernel panic.
if (abs_offset < block_limit) {
- *data = st->cur_skb->data + abs_offset;
+ *data = st->cur_skb->data + (abs_offset - st->stepped_offset);
I enabled the debug_tcp and with a few printks found that the code did
not go to the next_skb label and could find that the sequence being
followed was this -
It hit this if condition -
if (st->cur_skb->next) {
st->cur_skb = st->cur_skb->next;
st->frag_idx = 0;
goto next_skb;
And so, now the st pointer is shifted to the next skb whereas actually
it should have hit the second else if first since the data is in the
frag_list.
else if (st->root_skb == st->cur_skb &&
skb_shinfo(st->root_skb)->frag_list) {
st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
goto next_skb;
}
Reversing the two conditions the attached patch fixes the issue for me
on top of Herbert's patches.
Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 30 Jan 2009 00:07:52 +0000 (16:07 -0800)]
net: Fix frag_list handling in skb_seq_read
The frag_list handling was broken in skb_seq_read:
1) We didn't add the stepped offset when looking at the head
are of fragments other than the first.
2) We didn't take the stepped offset away when setting the data
pointer in the head area.
3) The frag index wasn't reset.
This patch fixes both issues.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>