Linus Torvalds [Tue, 2 Jul 2013 16:39:34 +0000 (09:39 -0700)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 update from Ted Ts'o:
"Lots of bug fixes, cleanups and optimizations. In the bug fixes
category, of note is a fix for on-line resizing file systems where the
block size is smaller than the page size (i.e., file systems 1k blocks
on x86, or more interestingly file systems with 4k blocks on Power or
ia64 systems.)
In the cleanup category, the ext4's punch hole implementation was
significantly improved by Lukas Czerner, and now supports bigalloc
file systems. In addition, Jan Kara significantly cleaned up the
write submission code path. We also improved error checking and added
a few sanity checks.
In the optimizations category, two major optimizations deserve
mention. The first is that ext4_writepages() is now used for
nodelalloc and ext3 compatibility mode. This allows writes to be
submitted much more efficiently as a single bio request, instead of
being sent as individual 4k writes into the block layer (which then
relied on the elevator code to coalesce the requests in the block
queue). Secondly, the extent cache shrink mechanism, which was
introduce in 3.9, no longer has a scalability bottleneck caused by the
i_es_lru spinlock. Other optimizations include some changes to reduce
CPU usage and to avoid issuing empty commits unnecessarily."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits)
ext4: optimize starting extent in ext4_ext_rm_leaf()
jbd2: invalidate handle if jbd2_journal_restart() fails
ext4: translate flag bits to strings in tracepoints
ext4: fix up error handling for mpage_map_and_submit_extent()
jbd2: fix theoretical race in jbd2__journal_restart
ext4: only zero partial blocks in ext4_zero_partial_blocks()
ext4: check error return from ext4_write_inline_data_end()
ext4: delete unnecessary C statements
ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()
jbd2: move superblock checksum calculation to jbd2_write_superblock()
ext4: pass inode pointer instead of file pointer to punch hole
ext4: improve free space calculation for inline_data
ext4: reduce object size when !CONFIG_PRINTK
ext4: improve extent cache shrink mechanism to avoid to burn CPU time
ext4: implement error handling of ext4_mb_new_preallocation()
ext4: fix corruption when online resizing a fs with 1K block size
ext4: delete unused variables
ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents
jbd2: remove debug dependency on debug_fs and update Kconfig help text
jbd2: use a single printk for jbd_debug()
...
Linus Torvalds [Tue, 2 Jul 2013 16:28:37 +0000 (09:28 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull VFS patches (part 1) from Al Viro:
"The major change in this pile is ->readdir() replacement with
->iterate(), dealing with ->f_pos races in ->readdir() instances for
good.
There's a lot more, but I'd prefer to split the pull request into
several stages and this is the first obvious cutoff point."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (67 commits)
[readdir] constify ->actor
[readdir] ->readdir() is gone
[readdir] convert ecryptfs
[readdir] convert coda
[readdir] convert ocfs2
[readdir] convert fatfs
[readdir] convert xfs
[readdir] convert btrfs
[readdir] convert hostfs
[readdir] convert afs
[readdir] convert ncpfs
[readdir] convert hfsplus
[readdir] convert hfs
[readdir] convert befs
[readdir] convert cifs
[readdir] convert freevxfs
[readdir] convert fuse
[readdir] convert hpfs
reiserfs: switch reiserfs_readdir_dentry to inode
reiserfs: is_privroot_deh() needs only directory inode, actually
...
Dave Chinner [Tue, 2 Jul 2013 12:38:35 +0000 (22:38 +1000)]
sync: don't block the flusher thread waiting on IO
When sync does it's WB_SYNC_ALL writeback, it issues data Io and
then immediately waits for IO completion. This is done in the
context of the flusher thread, and hence completely ties up the
flusher thread for the backing device until all the dirty inodes
have been synced. On filesystems that are dirtying inodes constantly
and quickly, this means the flusher thread can be tied up for
minutes per sync call and hence badly affect system level write IO
performance as the page cache cannot be cleaned quickly.
We already have a wait loop for IO completion for sync(2), so cut
this out of the flusher thread and delegate it to wait_sb_inodes().
Hence we can do rapid IO submission, and then wait for it all to
complete.
Effect of sync on fsmark before the patch:
FSUse% Count Size Files/sec App Overhead
.....
0 640000 4096 35154.6
1026984
0 720000 4096 36740.3
1023844
0 800000 4096 36184.6 916599
0 880000 4096 1282.7
1054367
0 960000 4096 3951.3 918773
0
1040000 4096 40646.2 996448
0
1120000 4096 43610.1 895647
0
1200000 4096 40333.1 921048
And a single sync pass took:
real 0m52.407s
user 0m0.000s
sys 0m0.090s
After the patch, there is no impact on fsmark results, and each
individual sync(2) operation run concurrently with the same fsmark
workload takes roughly 7s:
real 0m6.930s
user 0m0.000s
sys 0m0.039s
IOWs, sync is 7-8x faster on a busy filesystem and does not have an
adverse impact on ongoing async data write operations.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ashish Sangwan [Mon, 1 Jul 2013 12:12:41 +0000 (08:12 -0400)]
ext4: optimize starting extent in ext4_ext_rm_leaf()
Both hole punch and truncate use ext4_ext_rm_leaf() for removing
blocks. Currently we choose the last extent as the starting
point for removing blocks:
ex = EXT_LAST_EXTENT(eh);
This is OK for truncate but for hole punch we can optimize the extent
selection as the path is already initialized. We could use this
information to select proper starting extent. The code change in this
patch will not affect truncate as for truncate path[depth].p_ext will
always be NULL.
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 1 Jul 2013 12:12:41 +0000 (08:12 -0400)]
jbd2: invalidate handle if jbd2_journal_restart() fails
If jbd2_journal_restart() fails the handle will have been disconnected
from the current transaction. In this situation, the handle must not
be used for for any jbd2 function other than jbd2_journal_stop().
Enforce this with by treating a handle which has a NULL transaction
pointer as an aborted handle, and issue a kernel warning if
jbd2_journal_extent(), jbd2_journal_get_write_access(),
jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.
This commit also fixes a bug where jbd2_journal_stop() would trip over
a kernel jbd2 assertion check when trying to free an invalid handle.
Also move the responsibility of setting current->journal_info to
start_this_handle(), simplifying the three users of this function.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Younger Liu <younger.liu@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Theodore Ts'o [Mon, 1 Jul 2013 12:12:40 +0000 (08:12 -0400)]
ext4: translate flag bits to strings in tracepoints
Translate the bitfields used in various flags argument to strings to
make the tracepoint output more human-readable.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 1 Jul 2013 12:12:40 +0000 (08:12 -0400)]
ext4: fix up error handling for mpage_map_and_submit_extent()
The function mpage_released_unused_page() must only be called once;
otherwise the kernel will BUG() when the second call to
mpage_released_unused_page() tries to unlock the pages which had been
unlocked by the first call.
Also restructure the error handling so that we only give up on writing
the dirty pages in the case of ENOSPC where retrying the allocation
won't help. Otherwise, a transient failure, such as a kmalloc()
failure in calling ext4_map_blocks() might cause us to give up on
those pages, leading to a scary message in /var/log/messages plus data
loss.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Theodore Ts'o [Mon, 1 Jul 2013 12:12:40 +0000 (08:12 -0400)]
jbd2: fix theoretical race in jbd2__journal_restart
Once we decrement transaction->t_updates, if this is the last handle
holding the transaction from closing, and once we release the
t_handle_lock spinlock, it's possible for the transaction to commit
and be released. In practice with normal kernels, this probably won't
happen, since the commit happens in a separate kernel thread and it's
unlikely this could all happen within the space of a few CPU cycles.
On the other hand, with a real-time kernel, this could potentially
happen, so save the tid found in transaction->t_tid before we release
t_handle_lock. It would require an insane configuration, such as one
where the jbd2 thread was set to a very high real-time priority,
perhaps because a high priority real-time thread is trying to read or
write to a file system. But some people who use real-time kernels
have been known to do insane things, including controlling
laser-wielding industrial robots. :-)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Lukas Czerner [Mon, 1 Jul 2013 12:12:39 +0000 (08:12 -0400)]
ext4: only zero partial blocks in ext4_zero_partial_blocks()
Currently if we pass range into ext4_zero_partial_blocks() which covers
entire block we would attempt to zero it even though we should only zero
unaligned part of the block.
Fix this by checking whether the range covers the whole block skip
zeroing if so.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 1 Jul 2013 12:12:39 +0000 (08:12 -0400)]
ext4: check error return from ext4_write_inline_data_end()
The function ext4_write_inline_data_end() can return an error. So we
need to assign it to a signed integer variable to check for an error
return (since copied is an unsigned int).
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <wenqing.lz@taobao.com>
Cc: stable@vger.kernel.org
jon ernst [Mon, 1 Jul 2013 12:12:39 +0000 (08:12 -0400)]
ext4: delete unnecessary C statements
Comparing unsigned variable with 0 always returns false.
err = 0 is duplicated and unnecessary.
[ tytso: Also cleaned up error handling in ext4_block_zero_page_range() ]
Signed-off-by: "Jon Ernst" <jonernst07@gmx.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Al Viro [Mon, 1 Jul 2013 12:12:38 +0000 (08:12 -0400)]
ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()
Both ext3 and ext4 htree_dirblock_to_tree() is just filling the
in-core rbtree for use by call_filldir(). All updates of ->f_pos are
done by the latter; bumping it here (on error) is obviously wrong - we
might very well have it nowhere near the block we'd found an error in.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Mon, 1 Jul 2013 12:12:38 +0000 (08:12 -0400)]
jbd2: move superblock checksum calculation to jbd2_write_superblock()
Some of the functions which modify the jbd2 superblock were not
updating the checksum before calling jbd2_write_superblock(). Move
the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so
that the checksum is calculated consistently.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: stable@vger.kernel.org
Ashish Sangwan [Mon, 1 Jul 2013 12:12:38 +0000 (08:12 -0400)]
ext4: pass inode pointer instead of file pointer to punch hole
No need to pass file pointer when we can directly pass inode pointer.
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
boxi liu [Mon, 1 Jul 2013 12:12:37 +0000 (08:12 -0400)]
ext4: improve free space calculation for inline_data
In ext4 feature inline_data,it use the xattr's space to store the
inline data in inode.When we calculate the inline data as the xattr,we
add the pad.But in get_max_inline_xattr_value_size() function we count
the free space without pad.It cause some contents are moved to a block
even if it can be
stored in the inode.
Signed-off-by: liulei <lewis.liulei@huawei.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Tao Ma <boyu.mt@taobao.com>
Joe Perches [Mon, 1 Jul 2013 12:12:37 +0000 (08:12 -0400)]
ext4: reduce object size when !CONFIG_PRINTK
Reduce the object size ~10% could be useful for embedded systems.
Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and
arguments, passing " " to functions when !CONFIG_PRINTK and still
verifying format and arguments with no_printk.
$ size fs/ext4/built-in.o*
text data bss dec hex filename
239375 610 888 240873 3ace9 fs/ext4/built-in.o.new
264167 738 888 265793 40e41 fs/ext4/built-in.o.old
$ grep -E "CONFIG_EXT4|CONFIG_PRINTK" .config
# CONFIG_PRINTK is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT23=y
CONFIG_EXT4_FS_POSIX_ACL=y
# CONFIG_EXT4_FS_SECURITY is not set
# CONFIG_EXT4_DEBUG is not set
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Zheng Liu [Mon, 1 Jul 2013 12:12:37 +0000 (08:12 -0400)]
ext4: improve extent cache shrink mechanism to avoid to burn CPU time
Now we maintain an proper in-order LRU list in ext4 to reclaim entries
from extent status tree when we are under heavy memory pressure. For
keeping this order, a spin lock is used to protect this list. But this
lock burns a lot of CPU time. We can use the following steps to trigger
it.
% cd /dev/shm
% dd if=/dev/zero of=ext4-img bs=1M count=2k
% mkfs.ext4 ext4-img
% mount -t ext4 -o loop ext4-img /mnt
% cd /mnt
% for ((i=0;i<160;i++)); do truncate -s 64g $i; done
% for ((i=0;i<160;i++)); do cp $i /dev/null &; done
% perf record -a -g
% perf report
This commit tries to fix this problem. Now a new member called
i_touch_when is added into ext4_inode_info to record the last access
time for an inode. Meanwhile we never need to keep a proper in-order
LRU list. So this can avoid to burns some CPU time. When we try to
reclaim some entries from extent status tree, we use list_sort() to get
a proper in-order list. Then we traverse this list to discard some
entries. In ext4_sb_info, we use s_es_last_sorted to record the last
time of sorting this list. When we traverse the list, we skip the inode
that is newer than this time, and move this inode to the tail of LRU
list. When the head of the list is newer than s_es_last_sorted, we will
sort the LRU list again.
In this commit, we break the loop if s_extent_cache_cnt == 0 because
that means that all extents in extent status tree have been reclaimed.
Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is
changed to save a local variable in these functions.
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Alexey Khoroshilov [Mon, 1 Jul 2013 12:12:36 +0000 (08:12 -0400)]
ext4: implement error handling of ext4_mb_new_preallocation()
If memory allocation in ext4_mb_new_group_pa() is failed,
it returns error code, ext4_mb_new_preallocation() propages it,
but ext4_mb_new_blocks() ignores it.
An observed result was:
- allocation fail means ext4_mb_new_group_pa() does not update
ext4_allocation_context;
- ext4_mb_new_blocks() sets ext4_allocation_request->len (ar->len =
ac->ac_b_ex.fe_len;) to number of blocks preallocated (512) instead
of number of blocks requested (1);
- that activates update cycle in ext4_splice_branch():
for (i = 1; i < blks; i++) <-- blks is 512 instead of 1 here
*(where->p + i) = cpu_to_le32(current_block++);
- it iterates 511 times and corrupts a chunk of memory including inode
structure;
- page fault happens at EXT4_SB(inode->i_sb) in ext4_mark_inode_dirty();
- system hangs with 'scheduling while atomic' BUG.
The patch implements a check for ext4_mb_new_preallocation() error
code and handles its failure as if ext4_mb_regular_allocator() fails.
Found by Linux File System Verification project (linuxtesting.org).
[ Patch restructed by tytso to make the flow of control easier to follow. ]
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Maarten ter Huurne [Mon, 1 Jul 2013 12:12:08 +0000 (08:12 -0400)]
ext4: fix corruption when online resizing a fs with 1K block size
Subtracting the number of the first data block places the superblock
backups one block too early, corrupting the file system. When the block
size is larger than 1K, the first data block is 0, so the subtraction
has no effect and no corruption occurs.
Signed-off-by: Maarten ter Huurne <maarten@treewalker.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
CC: stable@vger.kernel.org
Linus Torvalds [Sun, 30 Jun 2013 22:13:29 +0000 (15:13 -0700)]
Linux 3.10
Linus Torvalds [Sun, 30 Jun 2013 22:08:15 +0000 (15:08 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull another powerpc fix from Benjamin Herrenschmidt:
"I mentioned that while we had fixed the kernel crashes, EEH error
recovery didn't always recover... It appears that I had a fix for
that already in powerpc-next (with a stable CC).
I cherry-picked it today and did a few tests and it seems that things
now work quite well. The patch is also pretty simple, so I see no
reason to wait before merging it."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/eeh: Fix fetching bus for single-dev-PE
Linus Torvalds [Sun, 30 Jun 2013 22:06:25 +0000 (15:06 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of seven bug fixes. Several fcoe fixes for locking
problems, initiator issues and a VLAN API change, all of which could
eventually lead to data corruption, one fix for a qla2xxx locking
problem which could lead to multiple completions of the same request
(and subsequent data corruption) and a use after free in the ipr
driver. Plus one minor MAINTAINERS file update"
(only six bugfixes in this pull, since I had already pulled the fcoe API
fix directly from Robert Love)
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] ipr: Avoid target_destroy accessing memory after it was freed
[SCSI] qla2xxx: Fix for locking issue between driver ISR and mailbox routines
MAINTAINERS: Fix fcoe mailing list
libfc: extend ex_lock to protect all of fc_seq_send
libfc: Correct check for initiator role
libfcoe: Fix Conflicting FCFs issue in the fabric
Gavin Shan [Wed, 5 Jun 2013 07:34:02 +0000 (15:34 +0800)]
powerpc/eeh: Fix fetching bus for single-dev-PE
While running Linux as guest on top of phyp, we possiblly have
PE that includes single PCI device. However, we didn't return
its PCI bus correctly and it leads to failure on recovery from
EEH errors for single-dev-PE. The patch fixes the issue.
Cc: <stable@vger.kernel.org> # v3.7+
Cc: Steve Best <sbest@us.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sun, 30 Jun 2013 00:02:48 +0000 (17:02 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"We discovered some breakage in our "EEH" (PCI Error Handling) code
while doing error injection, due to a couple of regressions. One of
them is due to a patch (
37f02195bee9 "powerpc/pci: fix PCI-e devices
rescan issue on powerpc platform") that, in hindsight, I shouldn't
have merged considering that it caused more problems than it solved.
Please pull those two fixes. One for a simple EEH address cache
initialization issue. The other one is a patch from Guenter that I
had originally planned to put in 3.11 but which happens to also fix
that other regression (a kernel oops during EEH error handling and
possibly hotplug).
With those two, the couple of test machines I've hammered with error
injection are remaining up now. EEH appears to still fail to recover
on some devices, so there is another problem that Gavin is looking
into but at least it's no longer crashing the kernel."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/pci: Improve device hotplug initialization
powerpc/eeh: Add eeh_dev to the cache during boot
Olof Johansson [Sat, 29 Jun 2013 23:25:14 +0000 (16:25 -0700)]
ARM: dt: Only print warning, not WARN() on bad cpu map in device tree
Due to recent changes and expecations of proper cpu bindings, there are
now cases for many of the in-tree devicetrees where a WARN() will hit
on boot due to badly formatted /cpus nodes.
Downgrade this to a pr_warn() to be less alarmist, since it's not a
new problem.
Tested on Arndale, Cubox, Seaboard and Panda ES. Panda hits the WARN
without this, the others do not.
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Guenter Roeck [Mon, 10 Jun 2013 17:18:08 +0000 (10:18 -0700)]
powerpc/pci: Improve device hotplug initialization
Commit
37f02195b (powerpc/pci: fix PCI-e devices rescan issue on powerpc
platform) fixes a problem with interrupt and DMA initialization on hot
plugged devices. With this commit, interrupt and DMA initialization for
hot plugged devices is handled in the pci device enable function.
This approach has a couple of drawbacks. First, it creates two code paths
for device initialization, one for hot plugged devices and another for devices
known during the initial PCI scan. Second, the initialization code for hot
plugged devices is only called when the device is enabled, ie typically
in the probe function. Also, the platform specific setup code is called each
time pci_enable_device() is called, not only once during device discovery,
meaning it is actually called multiple times, once for devices discovered
during the initial scan and again each time a driver is re-loaded.
The visible result is that interrupt pins are only assigned to hot plugged
devices when the device driver is loaded. Effectively this changes the PCI
probe API, since pci_dev->irq and the device's dma configuration will now
only be valid after pci_enable() was called at least once. A more subtle
change is that platform specific PCI device setup is moved from device
discovery into the driver's probe function, more specifically into the
pci_enable_device() call.
To fix the inconsistencies, add new function pcibios_add_device.
Call pcibios_setup_device from pcibios_setup_bus_devices if device setup
is not complete, and from pcibios_add_device if bus setup is complete.
With this change, device setup code is moved back into device initialization,
and called exactly once for both static and hot plugged devices.
[ This also fixes a regression introduced by the above patch which
causes dev->irq to be overwritten under some cirumstances after
MSIs have been enabled for the device which leads to crashes due
to the MSI core "hijacking" dev->irq to store the base MSI number
and not the LSI. --BenH
]
Cc: Yuanquan Chen <Yuanquan.Chen@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hiroo Matsumoto <matsumoto.hiroo@jp.fujitsu.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sat, 29 Jun 2013 18:34:18 +0000 (11:34 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a crash in the crypto layer exposed by an SCTP test tool"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algboss - Hold ref count on larval
Linus Torvalds [Sat, 29 Jun 2013 18:32:05 +0000 (11:32 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm/qxl fix from Dave Airlie:
"Bad me forgot an access check, possible security issue, but since this
is the first kernel with it, should be fine to just put it in now"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/qxl: add missing access check for execbuffer ioctl
Mathieu Desnoyers [Fri, 28 Jun 2013 13:49:46 +0000 (09:49 -0400)]
Fix: kernel/ptrace.c: ptrace_peek_siginfo() missing __put_user() validation
This __put_user() could be used by unprivileged processes to write into
kernel memory. The issue here is that even if copy_siginfo_to_user()
fails, the error code is not checked before __put_user() is executed.
Luckily, ptrace_peek_siginfo() has been added within the 3.10-rc cycle,
so it has not hit a stable release yet.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 29 Jun 2013 17:31:15 +0000 (10:31 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
"This is a recently spotted regression in the snapshot behavior...
It turns out several tests weren't being run in the nightlies so this
took a while to spot"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: send snapshot context with writes
Linus Torvalds [Sat, 29 Jun 2013 17:30:31 +0000 (10:30 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull ubifs fixes from Al Viro:
"A couple of ubifs readdir/lseek race fixes. Stable fodder, really
nasty..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
UBIFS: fix a horrid bug
UBIFS: prepare to fix a horrid bug
Linus Torvalds [Sat, 29 Jun 2013 17:28:52 +0000 (10:28 -0700)]
Merge tag 'for-linus-
20130628' of git://git./linux/kernel/git/dhowells/linux-mn10300
Pull two MN10300 fixes from David Howells:
"The first fixes a problem with passing arrays rather than pointers to
get_user() where __typeof__ then wants to declare and initialise an
array variable which gcc doesn't like.
The second fixes a problem whereby putting mem=xxx into the kernel
command line causes init=xxx to get an incorrect value."
* tag 'for-linus-
20130628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300:
mn10300: Use early_param() to parse "mem=" parameter
mn10300: Allow to pass array name to get_user()
Linus Torvalds [Sat, 29 Jun 2013 17:27:19 +0000 (10:27 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"Correct an ordering issue in the tick broadcast code. I really wish
we'd get compensation for pain and suffering for each line of code we
write to work around dysfunctional timer hardware."
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Fix tick_broadcast_pending_mask not cleared
Linus Torvalds [Sat, 29 Jun 2013 17:26:50 +0000 (10:26 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
"One more fix for a recently discovered bug"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Disable monitoring on setuid processes for regular users
Al Viro [Thu, 23 May 2013 02:22:04 +0000 (22:22 -0400)]
[readdir] constify ->actor
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 23 May 2013 01:44:23 +0000 (21:44 -0400)]
[readdir] ->readdir() is gone
everything's converted to ->iterate()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 23 May 2013 01:23:40 +0000 (21:23 -0400)]
[readdir] convert ecryptfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 23 May 2013 01:15:30 +0000 (21:15 -0400)]
[readdir] convert coda
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 23 May 2013 01:06:00 +0000 (21:06 -0400)]
[readdir] convert ocfs2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 22 May 2013 22:37:16 +0000 (18:37 -0400)]
[readdir] convert fatfs
... pox upon the idiotic ioctls; life would be much easier without
those.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 22 May 2013 21:07:56 +0000 (17:07 -0400)]
[readdir] convert xfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 22 May 2013 20:48:09 +0000 (16:48 -0400)]
[readdir] convert btrfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 22 May 2013 20:34:19 +0000 (16:34 -0400)]
[readdir] convert hostfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 22 May 2013 20:31:14 +0000 (16:31 -0400)]
[readdir] convert afs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 22 May 2013 19:11:27 +0000 (15:11 -0400)]
[readdir] convert ncpfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 22 May 2013 18:59:39 +0000 (14:59 -0400)]
[readdir] convert hfsplus
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 22 May 2013 18:29:35 +0000 (14:29 -0400)]
[readdir] convert hfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 22 May 2013 17:44:05 +0000 (13:44 -0400)]
[readdir] convert befs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 22 May 2013 20:17:25 +0000 (16:17 -0400)]
[readdir] convert cifs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 18 May 2013 07:15:00 +0000 (03:15 -0400)]
[readdir] convert freevxfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 18 May 2013 07:03:58 +0000 (03:03 -0400)]
[readdir] convert fuse
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 18 May 2013 06:58:57 +0000 (02:58 -0400)]
[readdir] convert hpfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 18 May 2013 02:58:58 +0000 (22:58 -0400)]
reiserfs: switch reiserfs_readdir_dentry to inode
... and clean the callers up a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 18 May 2013 02:45:29 +0000 (22:45 -0400)]
reiserfs: is_privroot_deh() needs only directory inode, actually
... and that - only to get the superblock. Privroot is a directory
and we don't allow hardlinks to those...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 18 May 2013 02:42:17 +0000 (22:42 -0400)]
[readdir] convert reiserfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 18 May 2013 01:22:31 +0000 (21:22 -0400)]
[readdir] convert ntfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 18 May 2013 01:11:59 +0000 (21:11 -0400)]
[readdir] convert isofs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 22:08:49 +0000 (18:08 -0400)]
[readdir] convert jffs2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 22:02:17 +0000 (18:02 -0400)]
[readdir] convert f2fs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 21:51:41 +0000 (17:51 -0400)]
[readdir] convert 9p
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 21:44:42 +0000 (17:44 -0400)]
[readdir] convert affs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 21:30:10 +0000 (17:30 -0400)]
[readdir] convert adfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 21:06:34 +0000 (17:06 -0400)]
[readdir] convert logfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 21:00:34 +0000 (17:00 -0400)]
[readdir] convert jfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 20:52:26 +0000 (16:52 -0400)]
[readdir] convert ceph
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 20:34:50 +0000 (16:34 -0400)]
[readdir] convert nfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 20:08:53 +0000 (16:08 -0400)]
[readdir] convert ext4
and trim the living hell out bogosities in inline dir case
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 19:32:10 +0000 (15:32 -0400)]
[readdir] convert qnx6
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 19:17:59 +0000 (15:17 -0400)]
[readdir] convert qnx4
... and use strnlen() instead of strlen() - it's done on untrusted data,
after all.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 May 2013 19:05:25 +0000 (15:05 -0400)]
[readdir] convert omfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 18:36:14 +0000 (14:36 -0400)]
[readdir] convert nilfs2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 18:31:02 +0000 (14:31 -0400)]
[readdir] convert sysfs
get rid of the kludges in sysfs_readdir()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 18:14:48 +0000 (14:14 -0400)]
[readdir] convert gfs2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 17:48:17 +0000 (13:48 -0400)]
[readdir] convert exofs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 17:41:48 +0000 (13:41 -0400)]
[readdir] convert bfs
... and get rid of that ridiculous mutex in bfs_readdir()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 16:07:31 +0000 (12:07 -0400)]
[readdir] convert procfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 05:52:12 +0000 (01:52 -0400)]
[readdir] convert openpromfs
what the hell is op_mutex for, BTW?
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 05:41:10 +0000 (01:41 -0400)]
[readdir] convert efs
* sanity checks belong before risky operation, not after it
* don't quit as soon as we'd found an entry
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 05:28:34 +0000 (01:28 -0400)]
[readdir] convert configfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 05:22:00 +0000 (01:22 -0400)]
[readdir] convert romfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 05:17:58 +0000 (01:17 -0400)]
[readdir] convert squashfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 05:14:46 +0000 (01:14 -0400)]
[readdir] convert ubifs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 05:09:37 +0000 (01:09 -0400)]
[readdir] convert udf
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 01:02:48 +0000 (21:02 -0400)]
[readdir] convert ext3
new helper: dir_relax(inode). Call when you are in location that will
_not_ be invalidated by directory modifications (block boundary, in case
of ext*). Returns whether the directory has survived (dropping i_mutex
allows rmdir to kill the sucker; if it returns false to us, ->iterate()
is obviously done)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 16 May 2013 00:23:06 +0000 (20:23 -0400)]
[readdir] switch dcache_readdir() users to ->iterate()
new helpers - dir_emit_dot(file, ctx, dentry), dir_emit_dotdot(file, ctx),
dir_emit_dots(file, ctx).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 15 May 2013 22:51:49 +0000 (18:51 -0400)]
[readdir] simple local unixlike: switch to ->iterate()
ext2, ufs, minix, sysv
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 15 May 2013 22:49:12 +0000 (18:49 -0400)]
[readdir] introduce ->iterate(), ctx->pos, dir_emit()
New method - ->iterate(file, ctx). That's the replacement for ->readdir();
it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
calls dir_emit(ctx, ...) instead of filldir(data, ...). It does *not*
update file->f_pos (or look at it, for that matter); iterate_dir() does the
update.
Note that dir_emit() takes the offset from ctx->pos (and eventually
filldir_t will lose that argument).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 15 May 2013 17:52:59 +0000 (13:52 -0400)]
[readdir] introduce iterate_dir() and dir_context
iterate_dir(): new helper, replacing vfs_readdir().
struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 12 May 2013 14:14:07 +0000 (10:14 -0400)]
move linux/loop.h to drivers/block
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 12 May 2013 14:12:11 +0000 (10:12 -0400)]
compat.c: LOOP_CLR_FD is taken care of in loop.c itself...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 11 May 2013 16:39:26 +0000 (12:39 -0400)]
pxa3xx: VM_IO is set by io_remap_pfn_range()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 11 May 2013 16:38:38 +0000 (12:38 -0400)]
au1100fb: VM_IO is set by io_remap_pfn_range()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 11 May 2013 16:37:38 +0000 (12:37 -0400)]
au1200fb: io_remap_pfn_range() sets VM_IO
... and single return is quite sufficient to get out of function, TYVM
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 11 May 2013 16:33:31 +0000 (12:33 -0400)]
vfio: remap_pfn_range() sets all those flags...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 11 May 2013 16:27:16 +0000 (12:27 -0400)]
i810: VM_IO is set by io_remap_pfn_range()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 11 May 2013 16:23:17 +0000 (12:23 -0400)]
drm: io_remap_pfn_range() sets VM_IO...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 11 May 2013 16:21:55 +0000 (12:21 -0400)]
sparc: __pci_mmap_set_flags() is useless
io_remap_pfn_range() does all we need
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 11 May 2013 16:19:34 +0000 (12:19 -0400)]
mn10300: don't bother with VM_IO
io_remap_pfn_range() sets it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 11 May 2013 16:18:01 +0000 (12:18 -0400)]
hose_mmap_page_range(): io_remap_pfn_range() will set all those flags...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 11 May 2013 16:15:47 +0000 (12:15 -0400)]
samsung: don't bother with setting VM_IO
io_remap_pfn_range() will set it just fine
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>