Tejun Heo [Tue, 20 Jul 2010 22:18:07 +0000 (15:18 -0700)]
vfs: don't hold s_umount over close_bdev_exclusive() call
Fix an obscure AB-BA deadlock in get_sb_bdev().
When a superblock is mounted more than once get_sb_bdev() calls
close_bdev_exclusive() to drop the extra bdev reference while holding
s_umount. However, sb->s_umount nests inside bd_mutex during
__invalidate_device() and close_bdev_exclusive() acquires bd_mutex during
blkdev_put(); thus creating an AB-BA deadlock.
This condition doesn't trigger frequently. For this condition to be
visible to lockdep, the filesystem must occupy the whole device (as
__invalidate_device() only grabs bd_mutex for the whole device), the FS
must be mounted more than once and partition rescan should be issued while
the FS is still mounted.
Fix it by dropping s_umount over close_bdev_exclusive().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ciprian Docan <docan@eden.rutgers.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Mon, 5 Jul 2010 12:15:04 +0000 (15:15 +0300)]
sysv: do not mark superblock dirty on remount
No need to mark the superblock as dirty in sysv_remount, synchronize
it instead (only if mounting R/O).
I did not find any docs about this file-system, and I have no possibility
to test my changes. Thus, this is untested. I see other issues in sysv,
e.g., why sysv_sync_fs writes only in the FSTYPE_SYSV4 case? However,
it marks its SB bh's dirty for all types, and does not wait for them
ever. With zero docs I'm unable to fix this.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Mon, 5 Jul 2010 12:15:03 +0000 (15:15 +0300)]
sysv: do not mark superblock dirty on mount
I did not find any docs about this file-system, and I have no possibility
to test my changes. Thus, this is untested.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Mon, 5 Jul 2010 12:15:02 +0000 (15:15 +0300)]
btrfs: remove junk sb_dirt change
BTRFS does not define a '->write_super()' method, so it should
not mark its superblock as dirty. This looks like some left-over.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Acked-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Mon, 5 Jul 2010 12:15:01 +0000 (15:15 +0300)]
BFS: clean up the superblock usage
BFS is a very simple FS and its superblocks contains only static
information and is never changed. However, the BFS code for some
misterious reasons marked its buffer head as dirty from time to
time, but nothing in that buffer was ever changed.
This patch removes all the BFS superblock manipulation, simply
because it is not needed. It removes:
1. The si_sbh filed from 'struct bfs_sb_info' because it is not
needed. We only need to read the SB once on mount to get the
start of data blocks and the FS size. After this, we can forget
about the SB.
2. All instances of 'mark_buffer_dirty(sbh)' for BFS SB because
it is never changed.
3. The '->sync_fs()' method because there is nothing to sync
(inodes are synched by VFS).
4. The '->write_super()' method, again, because the SB is never
changed.
Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Mon, 5 Jul 2010 12:15:00 +0000 (15:15 +0300)]
AFFS: wait for sb synchronization when needed
AFFS does not ever wait for superblock synchronization in
->put_super(), ->write_super, and ->sync_fs().
However, it should wait for synchronization in ->put_super() because
it is about to be unmounted, in ->write_super() because this is
periodic SB synchronization performed from a separate kernel thread,
and in ->sync_fs() it should respect the 'wait' flag. This patch fixes
the situation.
Also, in ->put_super(), do not write the SB if it is not dirty.
Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Mon, 5 Jul 2010 12:14:59 +0000 (15:14 +0300)]
AFFS: clean up dirty flag usage
In 'affs_write_super()': remove ancient and wrong commented code,
remove unneeded 'clean' variable, so the function becomes a bit
cleaner and simpler.
In 'affs_remount(): remove unnecessary SB dirty flag changes.
Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Sun, 18 Jul 2010 21:51:21 +0000 (17:51 -0400)]
cifs: truncate fallout
Remove the calls to inode_newsize_ok given that we already did it as
part of inode_change_ok in the beginning of cifs_setattr_(no)unix.
No need to call ->truncate if cifs doesn't have one, so remove the
explicit call in cifs_vmtruncate, and replace the calls to vmtruncate
with truncate_setsize which is vmtruncate minus inode_newsize_ok
and the call to ->truncate.
Rename cifs_vmtruncate to cifs_setsize to match the new calling conventions.
Question 1: why does cifs do the pagecache munging and i_size update twice
for each setattr call, once opencoded in cifs_vmtruncate, and once
using the VFS helpers?
Question 2: what is supposed to be protected by i_lock in cifs_vmtruncate?
Do we need it around the call to inode_change_ok?
[AV: fixed build breakage]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Andreas Gruenbacher [Wed, 21 Jul 2010 17:44:45 +0000 (19:44 +0200)]
mbcache: fix shrinker function return value
The shrinker function is supposed to return the number of cache
entries after shrinking, not before shrinking. Fix that.
Based on a patch from Wang Sheng-Hui <crosslonelyover@gmail.com>.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Andreas Gruenbacher [Mon, 19 Jul 2010 16:19:41 +0000 (18:19 +0200)]
mbcache: Remove unused features
The mbcache code was written to support a variable number of indexes,
but all the existing users use exactly one index. Simplify to code to
support only that case.
There are also no users of the cache entry free operation, and none of
the users keep extra data in cache entries. Remove those features as
well.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Wed, 7 Jul 2010 16:53:25 +0000 (18:53 +0200)]
add f_flags to struct statfs(64)
Add a flags field to help glibc implementing statvfs(3) efficiently.
We copy the flag values from glibc, and add a new ST_VALID flag to
denote that f_flags is implemented.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Wed, 7 Jul 2010 16:53:11 +0000 (18:53 +0200)]
pass a struct path to vfs_statfs
We'll need the path to implement the flags field for statvfs support.
We do have it available in all callers except:
- ecryptfs_statfs. This one doesn't actually need vfs_statfs but just
needs to do a caller to the lower filesystem statfs method.
- sys_ustat. Add a non-exported statfs_by_dentry helper for it which
doesn't won't be able to fill out the flags field later on.
In addition rename the helpers for statfs vs fstatfs to do_*statfs instead
of the misleading vfs prefix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 8 Jun 2010 04:37:12 +0000 (00:37 -0400)]
update VFS documentation for method changes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 18:35:46 +0000 (14:35 -0400)]
All filesystems that need invalidate_inode_buffers() are doing that explicitly
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 18:34:48 +0000 (14:34 -0400)]
convert remaining ->clear_inode() to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 17:43:19 +0000 (13:43 -0400)]
Make ->drop_inode() just return whether inode needs to be dropped
... and let iput_final() do the actual eviction or retention
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 17:23:20 +0000 (13:23 -0400)]
fs/inode.c:clear_inode() is gone
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 17:21:05 +0000 (13:21 -0400)]
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 17:20:09 +0000 (13:20 -0400)]
->delete_inode() is gone
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 17:16:22 +0000 (13:16 -0400)]
convert ext4 to ->evict_inode()
pretty much brute-force...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 17:11:34 +0000 (13:11 -0400)]
convert logfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 16:22:31 +0000 (12:22 -0400)]
logfs: get rid of magical inodes
ordering problems at ->kill_sb() time are solved by doing iput()
of these suckers in ->put_super()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 15:55:00 +0000 (11:55 -0400)]
convert nilfs2 to ->evict_inode()
[folded build fix from sfr]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 15:42:26 +0000 (11:42 -0400)]
convert exofs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 15:37:37 +0000 (11:37 -0400)]
convert reiserfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 15:35:40 +0000 (11:35 -0400)]
convert btrfs to ->evict_inode()
NB: do we want btrfs_wait_ordered_range() on eviction of
inodes with positive i_nlink on subvolume with zero root_refs?
If not, btrfs_evict_inode() can be simplified by unconditionally
bailing out in case of i_nlink > 0 in the very beginning...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 15:05:19 +0000 (11:05 -0400)]
switch gfs2 to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 9 Jun 2010 01:28:10 +0000 (21:28 -0400)]
convert ocfs2 to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 04:45:56 +0000 (00:45 -0400)]
switch ncpfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 04:43:39 +0000 (00:43 -0400)]
switch udf to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 04:34:05 +0000 (00:34 -0400)]
switch ubifs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 04:28:54 +0000 (00:28 -0400)]
switch jfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 04:18:40 +0000 (00:18 -0400)]
switch hpfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 04:12:50 +0000 (00:12 -0400)]
switch hppfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 03:49:18 +0000 (23:49 -0400)]
try to get rid of races in hostfs open()
In case of mode mismatch, do *not* blindly close the descriptor
another openers might be using right now. Open the underlying
file with currently sufficient mode, then
* if current mode has grown so that it's sufficient for
us now, just close our new fd
* if current mode has grown and our fd is *not* enough
to cover it, close and repeat.
* otherwise, install our fd if the file hadn't been
opened at all or dup2() our fd over the current one (and close
our fd).
Critical section is protected by mutex; yes, system-wide. All
we do under it is a bunch of comparison and maybe an overwriting
dup2() on host.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 03:19:04 +0000 (23:19 -0400)]
leak in hostfs_unlink()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 03:16:34 +0000 (23:16 -0400)]
hostfs: fix races in dentry_name() and inode_name()
calculating size, then doing allocation, then filling the
path is a Bad Idea(tm), since the ancestors can be renamed,
leading to buffer overrun.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 02:31:14 +0000 (22:31 -0400)]
new helper: __dentry_path()
builds path relative to fs root, called under dcache_lock,
doesn't append any nonsense to unlinked ones.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 01:51:16 +0000 (21:51 -0400)]
hostfs: sanitize symlinks
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 00:42:10 +0000 (20:42 -0400)]
hostfs: get rid of inode_dentry_name()
it's equivalent to dentry_name() anyway
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 00:33:12 +0000 (20:33 -0400)]
hostfs: get rid of file_type(), fold init_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 00:08:56 +0000 (20:08 -0400)]
switch stat_file() to passing a single struct rather than fsckloads of pointers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 6 Jun 2010 23:38:18 +0000 (19:38 -0400)]
hostfs: pass pathname to init_inode()
We will calculate it in all callers anyway, so there's no
need to duplicate that inside. Moreover, that way we lose
all failure exits in init_inode(), so it doesn't need to
return anything.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 6 Jun 2010 22:43:19 +0000 (18:43 -0400)]
get rid of hostfs_read_inode()
There are only two call sites; in one (hostfs_iget()) it's actually
a no-op and in another (fill_super()) it's easier to expand the
damn thing and use what we know about its arguments to simplify
it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 6 Jun 2010 21:53:01 +0000 (17:53 -0400)]
hostfs: don't keep a field in each inode when we are using it only in root
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 6 Jun 2010 19:16:17 +0000 (15:16 -0400)]
stop icache pollution in hostfs, switch to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 6 Jun 2010 14:16:41 +0000 (10:16 -0400)]
switch affs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 6 Jun 2010 14:12:01 +0000 (10:12 -0400)]
switch omfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 6 Jun 2010 13:50:39 +0000 (09:50 -0400)]
switch bfs to ->evict_inode(), clean up
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 6 Jun 2010 11:08:19 +0000 (07:08 -0400)]
convert ext3 to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 6 Jun 2010 01:20:32 +0000 (21:20 -0400)]
spufs conversion to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 23:40:56 +0000 (19:40 -0400)]
switch ufs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 23:28:32 +0000 (19:28 -0400)]
covert fatfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 23:22:50 +0000 (19:22 -0400)]
switch smbfs to evict_inode()
NB: treatment of inode hash is completely braindead there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 23:16:20 +0000 (19:16 -0400)]
switch sysv to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 23:10:41 +0000 (19:10 -0400)]
switch shmem.c to ->evice_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 20:29:45 +0000 (16:29 -0400)]
switch mqueue to ->evict_inode()
... and since the inodes are never hashed, we can use default ->drop_inode()
just fine.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 03:32:28 +0000 (23:32 -0400)]
merge ext2 delete_inode and clear_inode, switch to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 21 Jul 2010 21:22:47 +0000 (01:22 +0400)]
Don't dirty the victim in ext2_xattr_delete_inode()
... it's beyond fs-writeback reach already - writeback won't
be started at that point.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 21 Jul 2010 21:19:42 +0000 (01:19 +0400)]
Take dirtying the inode to callers of ext2_free_blocks()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 21 Jul 2010 21:13:36 +0000 (01:13 +0400)]
ext2: switch to dquot_free_block_nodirty()
brute-force conversion
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 02:27:38 +0000 (22:27 -0400)]
switch minix to ->evict_inode(), fix write_inode/delete_inode race
We need to wait for completion of possible writeback in progress
before we clear on-disk inode during deletion.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 02:21:54 +0000 (22:21 -0400)]
switch sysfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 02:17:56 +0000 (22:17 -0400)]
switch procfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 01:19:01 +0000 (21:19 -0400)]
simplify get_cramfs_inode()
simply don't hash the inodes that don't have real inumber instead of
skipping them during iget5_locked(); as the result, simple iget_locked()
would do and we can get rid of cramfs ->drop_inode() as well.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 01:02:59 +0000 (21:02 -0400)]
switch hypfs to ->evict_inode()
... and since we never hash its inodes, default
->drop_inode() will work just fine.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 00:55:25 +0000 (20:55 -0400)]
new helper: end_writeback()
Essentially, the minimal variant of ->evict_inode(). It's
a trimmed-down clear_inode(), sans any fs callbacks. Once
it returns we know that no async writeback will be happening;
every ->evict_inode() instance should do that once and do that
before doing anything ->write_inode() could interfere with
(e.g. freeing the on-disk inode).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 5 Jun 2010 00:19:55 +0000 (20:19 -0400)]
Take ->i_bdev/->i_cdev handling out of clear_inode()
All call chains to clear_inode() pass through evict_inode() and
clear_inode() should be called by evict_inode() exactly once.
So we can pull i_bdev/i_cdev detaching up to evict_inode() itself.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 4 Jun 2010 23:56:17 +0000 (19:56 -0400)]
generic_detach_inode() can be static now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 4 Jun 2010 23:52:12 +0000 (19:52 -0400)]
switch hugetlbfs to ->evict_inode()
The first spoils - hugetlb can use default ->drop_inode() now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 4 Jun 2010 23:40:39 +0000 (19:40 -0400)]
New method - evict_inode()
Hybrid of ->clear_inode() and ->delete_inode(); if present, does
all fs work to be done when in-core inode is about to be gone,
for whatever reason.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 4 Jun 2010 23:33:20 +0000 (19:33 -0400)]
unify fs/inode.c callers of clear_inode()
For now, just a straightforward merge
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 2 Jun 2010 21:38:30 +0000 (17:38 -0400)]
simplify checks for I_CLEAR/I_FREEING
add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information. As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 4 Jul 2010 08:24:09 +0000 (12:24 +0400)]
get rid of file_fsync()
Copy and simplify in the only two users remaining.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Mon, 14 Jun 2010 09:17:31 +0000 (05:17 -0400)]
xfs: new truncate sequence
Convert XFS to the new truncate sequence. We still can have errors after
updating the file size in xfs_setattr, but these are real I/O errors and lead
to a transaction abort and filesystem shutdown, so they are not an issue.
Errors from ->write_begin and write_end can now be handled correctly because
we can actually get rid of the delalloc extents while previous the buffer
state was stipped in block_invalidatepage.
There is still no error handling for ->direct_IO, because doing so will need
some major restructuring given that we only have the iolock shared and do not
hold i_mutex at all. Fortunately leaving the normally allocated blocks behind
there is not a major issue and this will get cleaned up by xfs_free_eofblock
later.
Note: the patch is against Al's vfs.git tree as that contains the nessecary
preparations. I'd prefer to get it applied there so that we can get some
testing in linux-next.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Boaz Harrosh [Wed, 9 Jun 2010 15:23:18 +0000 (18:23 +0300)]
exofs: New truncate sequence
These changes are crafted based on the similar
conversion done to ext2 by Nick Piggin.
* Remove the deprecated ->truncate vector. Let exofs_setattr
take care of on-disk size updates.
* Call truncate_pagecache on the unused pages if
write_begin/end fails.
* Cleanup exofs_delete_inode that did stupid inode
writes and updates on an inode that will be
removed.
* And finally get rid of exofs_get_block. We never
had any blocks it was all for calling nobh_truncate_page.
nobh_truncate_page is not actually needed in exofs since
the last page is complete and gone, just like all the other
pages. There is no partial blocks in exofs.
I've tested with this patch, and there are no apparent
failures, so far.
CC: Nick Piggin <npiggin@suse.de>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 8 Jun 2010 17:24:56 +0000 (13:24 -0400)]
jffs2: don't open-code iget_failed()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Mon, 7 Jun 2010 07:29:20 +0000 (09:29 +0200)]
update documentation for the new truncate sequence
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:30:04 +0000 (11:30 +0200)]
check ATTR_SIZE contraints in inode_change_ok
Make sure we check the truncate constraints early on in ->setattr by adding
those checks to inode_change_ok. Also clean up and document inode_change_ok
to make this obvious.
As a fallout we don't have to call inode_newsize_ok from simple_setsize and
simplify it down to a truncate_setsize which doesn't return an error. This
simplifies a lot of setattr implementations and means we use truncate_setsize
almost everywhere. Get rid of fat_setsize now that it's trivial and mark
ext2_setsize static to make the calling convention obvious.
Keep the inode_newsize_ok in vmtruncate for now as all callers need an
audit for its removal anyway.
Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
needs a deeper audit, but that is left for later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:30:03 +0000 (11:30 +0200)]
always call inode_change_ok early in ->setattr
Make sure we call inode_change_ok before doing any changes in ->setattr,
and make sure to call it even if our fs wants to ignore normal UNIX
permissions, but use the ATTR_FORCE to skip those.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:30:02 +0000 (11:30 +0200)]
remove inode_setattr
Replace inode_setattr with opencoded variants of it in all callers. This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.
In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:
spufs: explicitly checks for ATTR_SIZE earlier
btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:30:01 +0000 (11:30 +0200)]
default to simple_setattr
With the new truncate sequence every filesystem that wants to support file
size changes on disk needs to implement its own ->setattr. So instead
of calling inode_setattr which supports size changes call into a simple
method that doesn't support this. simple_setattr is almost what we
want except that it does not mark the inode dirty after changes. Given
that marking the inode dirty is a no-op for the simple in-memory filesystems
that use simple_setattr currently just add the mark_inode_dirty call.
Also add a WARN_ON for the presence of a truncate method to simple_setattr
to catch new instances of it during the transition period.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:30:00 +0000 (11:30 +0200)]
rename generic_setattr
Despite its name it's now a generic implementation of ->setattr, but
rather a helper to copy attributes from a struct iattr to the inode.
Rename it to setattr_copy to reflect this fact.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:29:59 +0000 (11:29 +0200)]
add missing setattr methods
For the new truncate sequence every filesystem that wants to truncate on-disk
state needs a seattr method. Convert the remaining filesystems that implement
the truncate inode operation to have its own setattr method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:29:58 +0000 (11:29 +0200)]
get rid of block_write_begin_newtrunc
Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to block_write_begin.
While we're at it also remove several unused arguments to block_write_begin.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:29:57 +0000 (11:29 +0200)]
introduce __block_write_begin
Split up the block_write_begin implementation - __block_write_begin is a new
trivial wrapper for block_prepare_write that always takes an already
allocated page and can be either called from block_write_begin or filesystem
code that already has a page allocated. Remove the handling of already
allocated pages from block_write_begin after switching all callers that
do it to __block_write_begin.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:29:56 +0000 (11:29 +0200)]
clean up write_begin usage for directories in pagecache
For filesystem that implement directories in pagecache we call
block_write_begin with an already allocated page for this code, while the
normal regular file write path uses the default block_write_begin behaviour.
Get rid of the __foofs_write_begin helper and opencode the normal write_begin
call in foofs_write_begin, while adding a new foofs_prepare_chunk helper for
the directory code. The added benefit is that foofs_prepare_chunk has
a much saner calling convention.
Note that the interruptible flag passed into block_write_begin is always
ignored if we already pass in a page (see next patch for details), and
we never were doing truncations of exessive blocks for this case either so we
can switch directly to block_write_begin_newtrunc.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:29:55 +0000 (11:29 +0200)]
get rid of cont_write_begin_newtrunc
Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to cont_write_begin.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:29:54 +0000 (11:29 +0200)]
get rid of nobh_write_begin_newtrunc
Move the call to vmtruncate to get rid of accessive blocks to the only
remaining caller and rename the non-truncating version to nobh_write_begin.
Get rid of the superflous file argument to it while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 4 Jun 2010 09:29:53 +0000 (11:29 +0200)]
sort out blockdev_direct_IO variants
Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 4 Jul 2010 08:23:11 +0000 (12:23 +0400)]
fix leak in __logfs_create()
if kmalloc fails, we still need to drop the inode, as we do
on other failure exits.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 4 Jul 2010 08:18:57 +0000 (12:18 +0400)]
Fix reiserfs_file_release()
a) count file openers correctly; i_count use was completely wrong
b) use new mutex for exclusion between final close/open/truncate,
to protect tailpacking logics. i_mutex use was wrong and resulted
in deadlocks.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 7 Jun 2010 03:56:02 +0000 (23:56 -0400)]
missing include in hppfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 21 Apr 2009 05:27:08 +0000 (01:27 -0400)]
Deal with missing exports for hostfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Tue, 3 Aug 2010 21:33:38 +0000 (14:33 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (49 commits)
xfs simplify and speed up direct I/O completions
xfs: move aio completion after unwritten extent conversion
direct-io: move aio_complete into ->end_io
xfs: fix big endian build
xfs: clean up xfs_bmap_get_bp
xfs: simplify xfs_truncate_file
xfs: kill the b_strat callback in xfs_buf
xfs: remove obsolete osyncisosync mount option
xfs: clean up filestreams helpers
xfs: fix gcc 4.6 set but not read and unused statement warnings
xfs: Fix build when CONFIG_XFS_POSIX_ACL=n
xfs: fix unsigned underflow in xfs_free_eofblocks
xfs: use GFP_NOFS for page cache allocation
xfs: fix memory reclaim recursion deadlock on locked inode buffer
xfs: fix xfs_trans_add_item() lockdep warnings
xfs: simplify and remove xfs_ireclaim
xfs: don't block on buffer read errors
xfs: move inode shrinker unregister even earlier
xfs: remove a dmapi leftover
xfs: writepage always has buffers
...
Linus Torvalds [Tue, 3 Aug 2010 21:33:09 +0000 (14:33 -0700)]
Merge git://git./linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (29 commits)
cifs: fsc should not default to "on"
[CIFS] remove redundant path walking in dfs_do_refmount
cifs: ignore the "mand", "nomand" and "_netdev" mount options
cifs: map NT_STATUS_ERROR_WRITE_PROTECTED to -EROFS
cifs: don't allow cifs_iget to match inodes of the wrong type
[CIFS] relinquish fscache cookie before freeing CIFSTconInfo
cifs: add separate cred_uid field to sesInfo
fs: cifs: check kmalloc() result
[CIFS] Missing ifdef
[CIFS] Missing line from previous commit
[CIFS] Fix build break when CONFIG_CIFS_FSCACHE disabled
cifs: add mount option to enable local caching
cifs: read pages from FS-Cache
cifs: store pages into local cache
cifs: FS-Cache page management
cifs: define inode-level cache object and register them
cifs: define superblock-level cache index objects and register them
cifs: remove unused cifsUidInfo struct
cifs: clean up cifs_find_smb_ses (try #2)
cifs: match secType when searching for existing tcp session
...
Linus Torvalds [Tue, 3 Aug 2010 21:31:24 +0000 (14:31 -0700)]
Merge branch 'devel' of /home/rmk/linux-2.6-arm
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (291 commits)
ARM: AMBA: Add pclk support to AMBA bus infrastructure
ARM: 6278/2: fix regression in RealView after the introduction of pclk
ARM: 6277/1: mach-shmobile: Allow users to select HZ, default to 128
ARM: 6276/1: mach-shmobile: remove duplicate NR_IRQS_LEGACY
ARM: 6246/1: mmci: support larger MMCIDATALENGTH register
ARM: 6245/1: mmci: enable hardware flow control on Ux500 variants
ARM: 6244/1: mmci: add variant data and default MCICLOCK support
ARM: 6243/1: mmci: pass power_mode to the translate_vdd callback
ARM: 6274/1: add global control registers definition header file for nuc900
mx2_camera: fix type of dma buffer virtual address pointer
mx2_camera: Add soc_camera support for i.MX25/i.MX27
arm/imx/gpio: add spinlock protection
ARM: Add support for the LPC32XX arch
ARM: LPC32XX: Arch config menu supoport and makefiles
ARM: LPC32XX: Phytec 3250 platform support
ARM: LPC32XX: Misc support functions
ARM: LPC32XX: Serial support code
ARM: LPC32XX: System suspend support
ARM: LPC32XX: GPIO, timer, and IRQ drivers
ARM: LPC32XX: Clock driver
...
Helge Deller [Mon, 2 Aug 2010 20:46:41 +0000 (22:46 +0200)]
PARISC: led.c - fix potential stack overflow in led_proc_write()
avoid potential stack overflow by correctly checking count parameter
Reported-by: Ilja <ilja@netric.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Elder [Mon, 2 Aug 2010 15:24:57 +0000 (10:24 -0500)]
Merge branch 'v2.6.35'
Jeff Layton [Mon, 26 Jul 2010 18:25:08 +0000 (14:25 -0400)]
cifs: fsc should not default to "on"
I'm not sure why this was merged with this flag hardcoded on, but it
seems quite dangerous. Turn it off.
Also, mount.cifs hands unrecognized options off to the kernel so there
should be no need for changes there in order to support this.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>