openwrt/staging/blogic.git
14 years agofs: switch bdev inode bdi's correctly
Dave Chinner [Thu, 21 Oct 2010 00:49:26 +0000 (11:49 +1100)]
fs: switch bdev inode bdi's correctly

bdev inodes can remain dirty even after their last close. Hence the
BDI associated with the bdev->inode gets modified duringthe last
close to point to the default BDI. However, the bdev inode still
needs to be moved to the dirty lists of the new BDI, otherwise it
will corrupt the writeback list is was left on.

Add a new function bdev_inode_switch_bdi() to move all the bdi state
from the old bdi to the new one safely. This is only a temporary
measure until the bdev inode<->bdi lifecycle problems are sorted
out.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: fix buffer invalidation in invalidate_list
Christoph Hellwig [Sat, 23 Oct 2010 17:07:20 +0000 (19:07 +0200)]
fs: fix buffer invalidation in invalidate_list

We must not call invalidate_inode_buffers in invalidate_list unless the
inode can be reclaimed.  If we remove the buffer association of a busy
inode fsync won't find the buffers anymore.  As invalidate_inode_buffers
is called from various others sources than umount this actually does
matter in practice.

While at it change the loop to a more natural form and remove the
WARN_ON for I_NEW, wich we already tested a few lines above.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofsnotify: use dget_parent
Christoph Hellwig [Sun, 10 Oct 2010 09:36:30 +0000 (05:36 -0400)]
fsnotify: use dget_parent

Use dget_parent instead of opencoding it.  This simplifies the code, but
more importanly prepares for the more complicated locking for a parent
dget in the dcache scale patch series.

It means we do grab a reference to the parent now if need to be watched,
but not with the specified mask.  If this turns out to be a problem
we'll have to revisit it, but for now let's keep as much as possible
dcache internals inside dcache.[ch].

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agosmbfs: use dget_parent
Christoph Hellwig [Sun, 10 Oct 2010 09:36:29 +0000 (05:36 -0400)]
smbfs: use dget_parent

Use dget_parent instead of opencoding it.  This simplifies the code, but
more importanly prepares for the more complicated locking for a parent
dget in the dcache scale patch series.

Note that the d_time assignment in smb_renew_times moves out of d_lock,
but it's a single atomic 32-bit value, and that's what other sites
setting it do already.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoexportfs: use dget_parent
Christoph Hellwig [Wed, 13 Oct 2010 15:56:37 +0000 (11:56 -0400)]
exportfs: use dget_parent

Use dget_parent instead of opencoding it.  This simplifies the code, but
more importanly prepares for the more complicated locking for a parent
dget in the dcache scale patch series.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: use RCU read side protection in d_validate
Christoph Hellwig [Sun, 10 Oct 2010 09:36:27 +0000 (05:36 -0400)]
fs: use RCU read side protection in d_validate

d_validate does a purely read lookup in the dentry hash, so use RCU read side
locking instead of dcache_lock.  Split out from a larget patch by
Nick Piggin <npiggin@suse.de>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: clean up dentry lru modification
Christoph Hellwig [Sun, 10 Oct 2010 09:36:26 +0000 (05:36 -0400)]
fs: clean up dentry lru modification

Always do a list_del_init on the LRU to make sure the list_empty invariant for
not beeing on the LRU always holds true, and fold dentry_lru_del_init into
dentry_lru_del.  Replace the dentry_lru_add_tail primitive with a
dentry_lru_move_tail operations that simpler when the dentry already is one
the list, which is always is.  Move the list_empty into dentry_lru_add to
fit the scheme of the other lru helpers, and simplify locking once we
move to a separate LRU lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: split __shrink_dcache_sb
Christoph Hellwig [Sun, 10 Oct 2010 09:36:25 +0000 (05:36 -0400)]
fs: split __shrink_dcache_sb

Currently __shrink_dcache_sb has an extremly awkward calling convention
because it tries to please very different callers.  Split out the
main loop into a shrink_dentry_list helper, which gets called directly
from shrink_dcache_sb for the cases where all dentries need to be pruned,
or from __shrink_dcache_sb for pruning only a certain number of dentries.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: improve DCACHE_REFERENCED usage
Nick Piggin [Sun, 10 Oct 2010 09:36:24 +0000 (05:36 -0400)]
fs: improve DCACHE_REFERENCED usage

dentry referenced bit is only set when installing the dentry back
onto the LRU. However with lazy LRU, the dentry can already be on
the LRU list at dput time, thus missing out on setting the referenced
bit. Fix this.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: use percpu counter for nr_dentry and nr_dentry_unused
Christoph Hellwig [Sun, 10 Oct 2010 09:36:23 +0000 (05:36 -0400)]
fs: use percpu counter for nr_dentry and nr_dentry_unused

The nr_dentry stat is a globally touched cacheline and atomic operation
twice over the lifetime of a dentry. It is used for the benfit of userspace
only. Turn it into a per-cpu counter and always decrement it in d_free instead
of doing various batching operations to reduce lock hold times in the callers.

Based on an earlier patch from Nick Piggin <npiggin@suse.de>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: simplify __d_free
Christoph Hellwig [Sun, 10 Oct 2010 09:36:22 +0000 (05:36 -0400)]
fs: simplify __d_free

Remove d_callback and always call __d_free with a RCU head.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: take dcache_lock inside __d_path
Christoph Hellwig [Sun, 10 Oct 2010 09:36:21 +0000 (05:36 -0400)]
fs: take dcache_lock inside __d_path

All callers take dcache_lock just around the call to __d_path, so
take the lock into it in preparation of getting rid of dcache_lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: do not assign default i_ino in new_inode
Christoph Hellwig [Sat, 23 Oct 2010 15:19:54 +0000 (11:19 -0400)]
fs: do not assign default i_ino in new_inode

Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves.  For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: introduce a per-cpu last_ino allocator
Eric Dumazet [Sat, 23 Oct 2010 15:18:01 +0000 (11:18 -0400)]
fs: introduce a per-cpu last_ino allocator

new_inode() dirties a contended cache line to get increasing
inode numbers. This limits performance on workloads that cause
significant parallel inode allocation.

Solve this problem by using a per_cpu variable fed by the shared
last_ino in batches of 1024 allocations.  This reduces contention on
the shared last_ino, and give same spreading ino numbers than before
(i.e. same wraparound after 2^32 allocations).

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agonew helper: ihold()
Al Viro [Sat, 23 Oct 2010 15:11:40 +0000 (11:11 -0400)]
new helper: ihold()

Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: remove inode_add_to_list/__inode_add_to_list
Christoph Hellwig [Sat, 23 Oct 2010 11:15:32 +0000 (07:15 -0400)]
fs: remove inode_add_to_list/__inode_add_to_list

Split up inode_add_to_list/__inode_add_to_list.  Locking for the two
lists will be split soon so these helpers really don't buy us much
anymore.

The __ prefixes for the sb list helpers will go away soon, but until
inode_lock is gone we'll need them to distinguish between the locked
and unlocked variants.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: move i_count increments into find_inode/find_inode_fast
Christoph Hellwig [Sat, 23 Oct 2010 11:09:06 +0000 (07:09 -0400)]
fs: move i_count increments into find_inode/find_inode_fast

Now that iunique is not abusing find_inode anymore we can move the i_ref
increment back to where it belongs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: Stop abusing find_inode_fast in iunique
Christoph Hellwig [Sat, 23 Oct 2010 11:00:16 +0000 (07:00 -0400)]
fs: Stop abusing find_inode_fast in iunique

Stop abusing find_inode_fast for iunique and opencode the inode hash walk.
Introduce a new iunique_lock to protect the iunique counters once inode_lock
is removed.

Based on a patch originally from Nick Piggin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: Factor inode hash operations into functions
Dave Chinner [Sat, 23 Oct 2010 10:58:09 +0000 (06:58 -0400)]
fs: Factor inode hash operations into functions

Before replacing the inode hash locking with a more scalable
mechanism, factor the removal of the inode from the hashes rather
than open coding it in several places.

Based on a patch originally from Nick Piggin.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: Implement lazy LRU updates for inodes
Nick Piggin [Sat, 23 Oct 2010 10:55:17 +0000 (06:55 -0400)]
fs: Implement lazy LRU updates for inodes

Convert the inode LRU to use lazy updates to reduce lock and
cacheline traffic.  We avoid moving inodes around in the LRU list
during iget/iput operations so these frequent operations don't need
to access the LRUs. Instead, we defer the refcount checks to
reclaim-time and use a per-inode state flag, I_REFERENCED, to tell
reclaim that iget has touched the inode in the past. This means that
only reclaim should be touching the LRU with any frequency, hence
significantly reducing lock acquisitions and the amount contention
on LRU updates.

This also removes the inode_in_use list, which means we now only
have one list for tracking the inode LRU status. This makes it much
simpler to split out the LRU list operations under it's own lock.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: Convert nr_inodes and nr_unused to per-cpu counters
Dave Chinner [Sat, 23 Oct 2010 09:03:02 +0000 (05:03 -0400)]
fs: Convert nr_inodes and nr_unused to per-cpu counters

The number of inodes allocated does not need to be tied to the
addition or removal of an inode to/from a list. If we are not tied
to a list lock, we could update the counters when inodes are
initialised or destroyed, but to do that we need to convert the
counters to be per-cpu (i.e. independent of a lock). This means that
we have the freedom to change the list/locking implementation
without needing to care about the counters.

Based on a patch originally from Eric Dumazet.

[AV: cleaned up a bit, fixed build breakage on weird configs

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agovfs: fix infinite loop caused by clone_mnt race
Miklos Szeredi [Tue, 5 Oct 2010 10:31:09 +0000 (12:31 +0200)]
vfs: fix infinite loop caused by clone_mnt race

If clone_mnt() happens while mnt_make_readonly() is running, the
cloned mount might have MNT_WRITE_HOLD flag set, which results in
mnt_want_write() spinning forever on this mount.

Needs CAP_SYS_ADMIN to trigger deliberately and unlikely to happen
accidentally.  But if it does happen it can hang the machine.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoswitch hfs to hlist_add_fake()
Al Viro [Sat, 23 Oct 2010 19:26:21 +0000 (15:26 -0400)]
switch hfs to hlist_add_fake()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agolist.h: new helper - hlist_add_fake()
Al Viro [Sat, 23 Oct 2010 19:23:40 +0000 (15:23 -0400)]
list.h: new helper - hlist_add_fake()

Make node look as if it was on hlist, with hlist_del()
working correctly.  Usable without any locking...

Convert a couple of places where we want to do that to
inode->i_hash.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agonew helper: inode_unhashed()
Al Viro [Sat, 23 Oct 2010 19:19:20 +0000 (15:19 -0400)]
new helper: inode_unhashed()

note: for race-free uses you inode_lock held

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agounexport invalidate_inodes
Al Viro [Sun, 24 Oct 2010 15:13:10 +0000 (11:13 -0400)]
unexport invalidate_inodes

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agosmbfs never retains inodes with zero refcount in the first place
Al Viro [Sun, 24 Oct 2010 14:48:15 +0000 (10:48 -0400)]
smbfs never retains inodes with zero refcount in the first place

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agontfs: don't call invalidate_inodes()
Al Viro [Sun, 24 Oct 2010 13:50:11 +0000 (09:50 -0400)]
ntfs: don't call invalidate_inodes()

We are in fill_super(); again, no inodes with zero i_count could
be around until we set MS_ACTIVE.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agogfs2: invalidate_inodes() is no-op there
Al Viro [Sun, 24 Oct 2010 13:46:50 +0000 (09:46 -0400)]
gfs2: invalidate_inodes() is no-op there

In fill_super() we hadn't MS_ACTIVE set yet, so there won't
be any inodes with zero i_count sitting around.

In put_super() we already have MS_ACTIVE removed *and* we
had called invalidate_inodes() since then.  So again there
won't be any inodes with zero i_count...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoext2_remount: don't bother with invalidate_inodes()
Al Viro [Sun, 24 Oct 2010 13:45:22 +0000 (09:45 -0400)]
ext2_remount: don't bother with invalidate_inodes()

It's pointless - we *do* have busy inodes (root directory,
for one), so that call will fail and attempt to change
XIP flag will be ignored.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs/buffer.c: call __block_write_begin() if we have page
Namhyung Kim [Mon, 25 Oct 2010 06:01:12 +0000 (15:01 +0900)]
fs/buffer.c: call __block_write_begin() if we have page

If we have the appropriate page already, call __block_write_begin()
directly instead of releasing and regrabbing it inside of
block_write_begin().

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agolockdep: fixup checking of dir inode annotation
Namhyung Kim [Mon, 11 Oct 2010 13:38:00 +0000 (22:38 +0900)]
lockdep: fixup checking of dir inode annotation

Since inode->i_mode shares its bits for S_IFMT, S_ISDIR should be
used to distinguish whether it is a dir or not.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoaio: bump i_count instead of using igrab
Chris Mason [Mon, 23 Aug 2010 14:47:55 +0000 (10:47 -0400)]
aio: bump i_count instead of using igrab

The aio batching code is using igrab to get an extra reference on the
inode so it can safely batch.  igrab will go ahead and take the global
inode spinlock, which can be a bottleneck on large machines doing lots
of AIO.

In this case, igrab isn't required because we already have a reference
on the file handle.  It is safe to just bump the i_count directly
on the inode.

Benchmarking shows this patch brings IOP/s on tons of flash up by about
2.5X.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agoupdate block_device_operations documentation
Christoph Hellwig [Wed, 6 Oct 2010 08:46:53 +0000 (10:46 +0200)]
update block_device_operations documentation

Updated Documentation/filesystems/Locking to match the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
14 years agofs/buffer.c: remove duplicated assignment on b_private
Namhyung Kim [Sat, 16 Oct 2010 03:40:33 +0000 (12:40 +0900)]
fs/buffer.c: remove duplicated assignment on b_private

bh->b_private is initialized within init_buffer(), thus the
assignment should be redundant. Remove it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: move exportfs since it is not a networking filesystem
Randy Dunlap [Fri, 15 Oct 2010 23:32:27 +0000 (16:32 -0700)]
fs: move exportfs since it is not a networking filesystem

Move the EXPORTFS kconfig symbol out of the NETWORK_FILESYSTEMS block
since it provides a library function that can be (and is) used by other
(non-network) filesystems.

This also eliminates a kconfig dependency warning:

warning: (XFS_FS && BLOCK || NFSD && NETWORK_FILESYSTEMS && INET && FILE_LOCKING && BKL) selects EXPORTFS which has unmet direct dependencies (NETWORK_FILESYSTEMS)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alex Elder <aelder@sgi.com>
Cc: xfs-masters@oss.sgi.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agohfs: use sync_dirty_buffer
Christoph Hellwig [Wed, 6 Oct 2010 08:49:17 +0000 (10:49 +0200)]
hfs: use sync_dirty_buffer

Use sync_dirty_buffer instead of the incorrect opencoding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agovfs: introduce FMODE_UNSIGNED_OFFSET for allowing negative f_pos
KAMEZAWA Hiroyuki [Fri, 1 Oct 2010 21:20:22 +0000 (14:20 -0700)]
vfs: introduce FMODE_UNSIGNED_OFFSET for allowing negative f_pos

Now, rw_verify_area() checsk f_pos is negative or not.  And if negative,
returns -EINVAL.

But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..  has
negative offsets.  And we can't do any access via read/write to the
file(device).

So introduce FMODE_UNSIGNED_OFFSET to allow negative file offsets.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agohostfs: fix UML crash: remove f_spare from hostfs
Richard Weinberger [Tue, 19 Oct 2010 21:27:59 +0000 (14:27 -0700)]
hostfs: fix UML crash: remove f_spare from hostfs

365b1818 ("add f_flags to struct statfs(64)") resized f_spare within
struct statfs which caused a UML crash.  There is no need to copy f_spare.

Signed-off-by: Richard Weinberger <richard@nod.at>
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoDocumentation: Fix trivial typo in filesystems/sharedsubtree.txt
Valerie Aurora [Mon, 30 Aug 2010 21:23:12 +0000 (17:23 -0400)]
Documentation: Fix trivial typo in filesystems/sharedsubtree.txt

Documentation: Fix trivial typo in filesystems/sharedsubtree.txt

This typo is easy to ignore unless you have spent a great deal of time
thinking about how to eliminate duplicate dentries in unions.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoaffs: testing the wrong variable
Dan Carpenter [Wed, 25 Aug 2010 07:05:27 +0000 (09:05 +0200)]
affs: testing the wrong variable

The intent was to verify that bh = affs_bread_ino(...) returned a valid
pointer.  We checked "ext_bh" earlier in the function and it's valid
here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: allow for more than 2^31 files
Eric Dumazet [Tue, 5 Oct 2010 07:32:55 +0000 (09:32 +0200)]
fs: allow for more than 2^31 files

Andrew,

Could you please review this patch, you probably are the right guy to
take it, because it crosses fs and net trees.

Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt
depend on previous patch (sysctl: fix min/max handling in
__do_proc_doulongvec_minmax())

Thanks !

[PATCH V4] fs: allow for more than 2^31 files

Robin Holt tried to boot a 16TB system and found af_unix was overflowing
a 32bit value :

<quote>

We were seeing a failure which prevented boot.  The kernel was incapable
of creating either a named pipe or unix domain socket.  This comes down
to a common kernel function called unix_create1() which does:

        atomic_inc(&unix_nr_socks);
        if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                goto out;

The function get_max_files() is a simple return of files_stat.max_files.
files_stat.max_files is a signed integer and is computed in
fs/file_table.c's files_init().

        n = (mempages * (PAGE_SIZE / 1024)) / 10;
        files_stat.max_files = n;

In our case, mempages (total_ram_pages) is approx 3,758,096,384
(0xe0000000).  That leaves max_files at approximately 1,503,238,553.
This causes 2 * get_max_files() to integer overflow.

</quote>

Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
integers, and change af_unix to use an atomic_long_t instead of
atomic_t.

get_max_files() is changed to return an unsigned long.
get_nr_files() is changed to return a long.

unix_nr_socks is changed from atomic_t to atomic_long_t, while not
strictly needed to address Robin problem.

Before patch (on a 64bit kernel) :
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
-18446744071562067968

After patch:
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
2147483648
# cat /proc/sys/fs/file-nr
704     0       2147483648

Reported-by: Robin Holt <holt@sgi.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David Miller <davem@davemloft.net>
Reviewed-by: Robin Holt <holt@sgi.com>
Tested-by: Robin Holt <holt@sgi.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoisofs: Fix isofs_get_blocks for 8TB files
Jan Kara [Mon, 4 Oct 2010 09:37:37 +0000 (11:37 +0200)]
isofs: Fix isofs_get_blocks for 8TB files

Currently isofs_get_blocks() was limited to handle only 4TB files on 32-bit
architectures because of unnecessary use of iblock variable which was signed
long. Just remove the variable. The error messages that were using this
variable should have rather used b_off anyway because that is the block we
are currently mapping.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: kill block_prepare_write
Christoph Hellwig [Wed, 6 Oct 2010 08:47:23 +0000 (10:47 +0200)]
fs: kill block_prepare_write

__block_write_begin and block_prepare_write are identical except for slightly
different calling conventions.  Convert all callers to the __block_write_begin
calling conventions and drop block_prepare_write.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: mark destroy_inode static
Christoph Hellwig [Wed, 6 Oct 2010 08:48:55 +0000 (10:48 +0200)]
fs: mark destroy_inode static

Hugetlbfs used to need it, but after the destroy_inode and evict_inode
changes it's not required anymore.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: add sync_inode_metadata
Christoph Hellwig [Wed, 6 Oct 2010 08:48:20 +0000 (10:48 +0200)]
fs: add sync_inode_metadata

Add a new helper to write out the inode using the writeback code,
that is including the correct dirty bit and list manipulation.  A few
of filesystems already opencode this, and a lot of others should be
using it instead of using write_inode_now which also writes out the
data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs: move permission check back into __lookup_hash
Christoph Hellwig [Wed, 6 Oct 2010 08:47:47 +0000 (10:47 +0200)]
fs: move permission check back into __lookup_hash

The caller that didn't need it is gone.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoMerge branch 'davinci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 25 Oct 2010 17:59:31 +0000 (10:59 -0700)]
Merge branch 'davinci-for-linus' of git://git./linux/kernel/git/khilman/linux-davinci

* 'davinci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci: (50 commits)
  davinci: fix remaining board support after io_pgoffst removal
  davinci: mityomapl138: make file local data static
  arm/davinci: remove duplicated include
  davinci: Initial support for Omapl138-Hawkboard
  davinci: MityDSP-L138/MityARM-1808 read MAC address from I2C Prom
  davinci: add tnetv107x touchscreen platform device
  input: add driver for tnetv107x touchscreen controller
  davinci: add keypad config for tnetv107x evm board
  davinci: add tnetv107x keypad platform device
  input: add driver for tnetv107x on-chip keypad controller
  net: davinci_emac: cleanup unused cpdma code
  net: davinci_emac: switch to new cpdma layer
  net: davinci_emac: separate out cpdma code
  net: davinci_emac: cleanup unused mdio emac code
  omap: cleanup unused davinci mdio arch code
  davinci: cleanup mdio arch code and switch to phy_id
  net: davinci_emac: switch to new mdio
  omap: add mdio platform devices
  davinci: add mdio platform devices
  net: davinci_emac: separate out davinci mdio
  ...

Fix up trivial conflict in drivers/input/keyboard/Kconfig (two entries
added next to each other - one from the davinci merge, one from the
input merge)

14 years agoMerge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Linus Torvalds [Mon, 25 Oct 2010 17:08:21 +0000 (10:08 -0700)]
Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  exofs: Remove inode->i_count manipulation in exofs_new_inode
  fs/exofs: typo fix of faild to failed
  exofs: Set i_mapping->backing_dev_info anyway
  exofs: Cleaup read path in regard with read_for_write

14 years agox86-32, mm: Remove duplicated include
Borislav Petkov [Mon, 25 Oct 2010 16:15:22 +0000 (18:15 +0200)]
x86-32, mm: Remove duplicated include

Commit b40827fa7268 ("x86-32, mm: Add an initial page table for core
bootstrapping") added an include directive which is needless and is
taken care of by a previous one.  Remove it.

Caught-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoexofs: Remove inode->i_count manipulation in exofs_new_inode
Boaz Harrosh [Sat, 16 Oct 2010 08:14:01 +0000 (19:14 +1100)]
exofs: Remove inode->i_count manipulation in exofs_new_inode

exofs_new_inode() was incrementing the inode->i_count and
decrementing it in create_done(), in a bad attempt to make sure
the inode will still be there when the asynchronous create_done()
finally arrives. This was very stupid because iput() was not called,
and if it was actually needed, it would leak the inode.

However all this is not needed, because at exofs_evict_inode()
we already wait for create_done() by waiting for the
object_created event. Therefore remove the superfluous ref counting
and just Thicken the comment at exofs_evict_inode() a bit.

While at it change places that open coded wait_obj_created()
to call the already available wrapper.

CC: Dave Chinner <dchinner@redhat.com>
CC: Christoph Hellwig <hch@lst.de>
CC: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
14 years agofs/exofs: typo fix of faild to failed
Joe Perches [Fri, 22 Oct 2010 05:17:17 +0000 (22:17 -0700)]
fs/exofs: typo fix of faild to failed

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
14 years agoMerge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
Linus Torvalds [Mon, 25 Oct 2010 15:36:50 +0000 (08:36 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6

* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (48 commits)
  [S390] topology: export cpu topology via proc/sysinfo
  [S390] topology: move topology sysinfo code
  [S390] topology: clean up facility detection
  [S390] cleanup facility list handling
  [S390] enable ARCH_DMA_ADDR_T_64BIT with 64BIT
  [S390] dasd: ignore unsolicited interrupts for DIAG
  [S390] kvm: Enable z196 instruction facilities
  [S390] dasd: fix unsolicited interrupt recognition
  [S390] dasd: fix use after free in dbf
  [S390] kvm: Fix badness at include/asm/mmu_context.h:83
  [S390] cio: fix I/O cancel function
  [S390] topology: change default
  [S390] smp: use correct cpu address in print_cpu_info()
  [S390] remove ieee_instruction_pointer from thread_struct
  [S390] cleanup system call parameter setup
  [S390] correct alignment of cpuid structure
  [S390] cleanup lowcore access from external interrupts
  [S390] cleanup lowcore access from program checks
  [S390] pgtable: move pte_mkhuge() from hugetlb.h to pgtable.h
  [S390] fix SIGBUS handling
  ...

14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Mon, 25 Oct 2010 15:32:05 +0000 (08:32 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (365 commits)
  ALSA: hda - Disable sticky PCM stream assignment for AD codecs
  ALSA: usb - Creative USB X-Fi volume knob support
  ALSA: ca0106: Use card specific dac id for mute controls.
  ALSA: ca0106: Allow different sound cards to use different SPI channel mappings.
  ALSA: ca0106: Create a nice spot for mapping channels to dacs.
  ALSA: ca0106: Move enabling of front dac out of hardcoded setup sequence.
  ALSA: ca0106: Pull out dac powering routine into separate function.
  ALSA: ca0106 - add Sound Blaster 5.1vx info.
  ASoC: tlv320dac33: Use usleep_range for delays
  ALSA: usb-audio: add Novation Launchpad support
  ALSA: hda - Add workarounds for CT-IBG controllers
  ALSA: hda - Fix wrong TLV mute bit for STAC/IDT codecs
  ASoC: tpa6130a2: Error handling for broken chip
  ASoC: max98088: Staticise m98088_eq_band
  ASoC: soc-core: Fix codec->name memory leak
  ALSA: hda - Apply ideapad quirk to Acer laptops with Cxt5066
  ALSA: hda - Add some workarounds for Creative IBG
  ALSA: hda - Fix wrong SPDIF NID assignment for CA0110
  ALSA: hda - Fix codec rename rules for ALC662-compatible codecs
  ALSA: hda - Add alc_init_jacks() call to other codecs
  ...

14 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb
Linus Torvalds [Mon, 25 Oct 2010 15:30:48 +0000 (08:30 -0700)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/dvrabel/uwb

* 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb:
  uwb: Orphan the UWB and WUSB subsystems
  uwb: Remove the WLP subsystem and drivers

14 years agoMerge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platf...
Linus Torvalds [Mon, 25 Oct 2010 15:28:13 +0000 (08:28 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/mjg59/platform-drivers-x86

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (44 commits)
  eeepc-wmi: Add cpufv sysfs interface
  eeepc-wmi: add additional hotkeys
  panasonic-laptop: Simplify calls to acpi_pcc_retrieve_biosdata
  panasonic-laptop: Handle errors properly if they happen
  intel_pmic_gpio: fix off-by-one value range checking
  IBM Real-Time "SMI Free" mode driver -v7
  Add OLPC XO-1 rfkill driver
  Move hdaps driver to platform/x86
  ideapad-laptop: Fix Makefile
  intel_pmic_gpio: swap the bits and mask args for intel_scu_ipc_update_register
  ideapad: Add param: no_bt_rfkill
  ideapad: Change the driver name to ideapad-laptop
  ideapad: rewrite the sw rfkill set
  ideapad: rewrite the hw rfkill notify
  ideapad: use EC command to control camera
  ideapad: use return value of _CFG to tell if device exist or not
  ideapad: make sure we bind on the correct device
  ideapad: check VPC bit before sync rfkill hw status
  ideapad: add ACPI helpers
  dell-laptop: Add debugfs support
  ...

14 years agoMerge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6
Linus Torvalds [Mon, 25 Oct 2010 15:19:14 +0000 (08:19 -0700)]
Merge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6

* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6:
  mtd/m25p80: add support to parse the partitions by OF node
  of/irq: of_irq.c needs to include linux/irq.h
  of/mips: Cleanup some include directives/files.
  of/mips: Add device tree support to MIPS
  of/flattree: Eliminate need to provide early_init_dt_scan_chosen_arch
  of/device: Rework to use common platform_device_alloc() for allocating devices
  of/xsysace: Fix OF probing on little-endian systems
  of: use __be32 types for big-endian device tree data
  of/irq: remove references to NO_IRQ in drivers/of/platform.c
  of/promtree: add package-to-path support to pdt
  of/promtree: add of_pdt namespace to pdt code
  of/promtree: no longer call prom_ functions directly; use an ops structure
  of/promtree: make drivers/of/pdt.c no longer sparc-only
  sparc: break out some PROM device-tree building code out into drivers/of
  of/sparc: convert various prom_* functions to use phandle
  sparc: stop exporting openprom.h header
  powerpc, of_serial: Endianness issues setting up the serial ports
  of: MTD: Fix OF probing on little-endian systems
  of: GPIO: Fix OF probing on little-endian systems

14 years agoMIPS: MT: Fix build error iFPU affinity code
Ralf Baechle [Sun, 24 Oct 2010 21:23:50 +0000 (22:23 +0100)]
MIPS: MT: Fix build error iFPU affinity code

Commit b0ae19811375 ("security: remove unused parameter from
security_task_setscheduler()") broke the build of
arch/mips/kernel/mips-mt-fpaff.c.  The function arguments were
unnecessary, not the semicolon ...

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoMerge branch 'ieee1394-removal' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 25 Oct 2010 15:05:29 +0000 (08:05 -0700)]
Merge branch 'ieee1394-removal' of git://git./linux/kernel/git/ieee1394/linux1394-2.6

* 'ieee1394-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  ieee1394: remove the old IEEE 1394 driver stack
  ieee1394: move init_ohci1394_dma to drivers/firewire/

Fix trivial change/delete conflict: drivers/ieee1394/eth1394.c is
getting removed, but was modified by the networking merge.

14 years agoCoda: replace BKL with mutex
Yoshihisa Abe [Mon, 25 Oct 2010 06:03:46 +0000 (02:03 -0400)]
Coda: replace BKL with mutex

Replace the BKL with a mutex to protect the venus_comm structure which
binds the mountpoint with the character device and holds the upcall
queues.

Signed-off-by: Yoshihisa Abe <yoshiabe@cs.cmu.edu>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoCoda: push BKL regions into coda_upcall()
Yoshihisa Abe [Mon, 25 Oct 2010 06:03:45 +0000 (02:03 -0400)]
Coda: push BKL regions into coda_upcall()

Now that shared inode state is locked using the cii->c_lock, the BKL is
only used to protect the upcall queues used to communicate with the
userspace cache manager. The remaining state is all local and we can
push the lock further down into coda_upcall().

Signed-off-by: Yoshihisa Abe <yoshiabe@cs.cmu.edu>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoCoda: add spin lock to protect accesses to struct coda_inode_info.
Yoshihisa Abe [Mon, 25 Oct 2010 06:03:44 +0000 (02:03 -0400)]
Coda: add spin lock to protect accesses to struct coda_inode_info.

We mostly need it to protect cached user permissions. The c_flags field
is advisory, reading the wrong value is harmless and in the worst case
we hit a slow path where we have to make an extra upcall to the
userspace cache manager when revalidating a dentry or inode.

Signed-off-by: Yoshihisa Abe <yoshiabe@cs.cmu.edu>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Mon, 25 Oct 2010 14:59:01 +0000 (07:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (75 commits)
  Input: wacom - specify Cinitq supported tools
  Input: ab8500-ponkey - fix IRQ freeing in error path
  Input: adp5588-keys - use more obvious i2c_device_id name string
  Input: ad7877 - switch to using threaded IRQ
  Input: ad7877 - use attribute group to control visibility of attributes
  Input: serio - add support for PS2Mult multiplexer protocol
  Input: wacom - properly enable runtime PM
  Input: ad7877 - filter events where pressure is beyond the maximum
  Input: ad7877 - implement EV_KEY:BTN_TOUCH reporting
  Input: ad7877 - implement specified chip select behavior
  Input: hp680_ts_input - use cancel_delayed_work_sync()
  Input: mousedev - correct lockdep annotation
  Input: ads7846 - switch to using threaded IRQ
  Input: serio - support multiple child devices per single parent
  Input: synaptics - simplify pass-through port handling
  Input: add ROHM BU21013 touch panel controller support
  Input: omap4-keypad - wake-up on events & long presses
  Input: omap4-keypad - fix interrupt line configuration
  Input: omap4-keypad - SYSCONFIG register configuration
  Input: omap4-keypad - use platform device helpers
  ...

14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
Linus Torvalds [Mon, 25 Oct 2010 14:51:49 +0000 (07:51 -0700)]
Merge git://git./linux/kernel/git/lethal/sh-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (110 commits)
  sh: i2c-sh7760: Replase from ctrl_* to __raw_*
  sh: clkfwk: Shuffle around to match the intc split up.
  sh: clkfwk: modify for_each_frequency end condition
  sh: fix clk_get() error handling
  sh: clkfwk: Fix fault in frequency iterator.
  sh: clkfwk: Add a helper for rate rounding by divisor ranges.
  sh: clkfwk: Abstract rate rounding helper.
  sh: clkfwk: support clock remapping.
  sh: pci: Convert to upper/lower_32_bits() helpers.
  sh: mach-sdk7786: Add support for the FPGA SRAM.
  sh: Provide a generic SRAM pool for tiny memories.
  sh: pci: Support secondary FPGA-driven PCIe clocks on SDK7786.
  sh: pci: Support slot 4 routing on SDK7786.
  sh: Fix up PMB locking.
  sh: mach-sdk7786: Add support for fpga gpios.
  sh: use pr_fmt for clock framework, too.
  sh: remove name and id from struct clk
  sh: free-without-alloc fix for sh_mobile_lcdcfb
  sh: perf: Set up perf_max_events.
  sh: perf: Support SH-X3 hardware counters.
  ...

Fix up trivial conflicts (perf_max_events got removed) in arch/sh/kernel/perf_event.c

14 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
Linus Torvalds [Mon, 25 Oct 2010 14:45:10 +0000 (07:45 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Revert "block: fix accounting bug on cross partition merges"

14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Linus Torvalds [Mon, 25 Oct 2010 14:44:27 +0000 (07:44 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/gerg/m68knommu

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: (21 commits)
  m68knommu: convert to using tracehook_report_syscall_*
  m68knommu: some boards use fixed phy for FEC ethernet
  m68knommu: support the external GPIO based interrupts of the 5272
  m68knommu: mask of vector bits in exception word properly
  m68knommu: change to new flag variables
  m68knommu: Fix MCFUART_TXFIFOSIZE for m548x.
  m68knommu: add basic mmu-less m548x support
  m68knommu: .gitignore vmlinux.lds
  m68knommu: stop using __do_IRQ
  m68knommu: rename PT_OFF_VECTOR to PT_OFF_FORMATVEC.
  m68knommu: add support for Coldfire 547x/548x interrupt controller
  m68k{nommu}: Remove unused DEFINE's from asm-offsets.c
  m68knommu: whitespace cleanup in 68328/entry.S
  m68knommu: Document supported chips in intc-2.c and intc-simr.c.
  m68knommu: fix strace support for 68328/68360
  m68knommu: fix default starting date
  arch/m68knommu: Removing dead 68328_SERIAL_UART2 config option
  arch/m68knommu: Removing dead RAM_{16,32}_MB config option
  arch/m68knommu: Removing dead M68KFPU_EMU config option
  arch/m68knommu: Removing dead RELOCATE config option
  ...

14 years ago[S390] topology: export cpu topology via proc/sysinfo
Heiko Carstens [Mon, 25 Oct 2010 14:10:54 +0000 (16:10 +0200)]
[S390] topology: export cpu topology via proc/sysinfo

Export the cpu configuration topology via sysinfo. Two new lines are
introduced:

CPU Topology HW:      0 0 0 4 6 4
CPU Topology SW:      0 0 0 0 4 24

The HW line describes the cpu topology nesting levels when the maximum
nesting level is used to get the corresponding SYSIB.
The SW line describes what Linux is actually using. In this case it
supports only two levels (CONFIG_SCHED_BOOK off) and therefore the
hardware folded the two lower levels in the SYSIB response block.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] topology: move topology sysinfo code
Heiko Carstens [Mon, 25 Oct 2010 14:10:53 +0000 (16:10 +0200)]
[S390] topology: move topology sysinfo code

Move the topology sysinfo SYSIB definitions to the proper place in
asm/sysinfo.h where they should be.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] topology: clean up facility detection
Heiko Carstens [Mon, 25 Oct 2010 14:10:52 +0000 (16:10 +0200)]
[S390] topology: clean up facility detection

Move cpu topology facility detection to early setup code where it
should be.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] cleanup facility list handling
Martin Schwidefsky [Mon, 25 Oct 2010 14:10:51 +0000 (16:10 +0200)]
[S390] cleanup facility list handling

Store the facility list once at system startup with stfl/stfle and
reuse the result for all facility tests.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] enable ARCH_DMA_ADDR_T_64BIT with 64BIT
FUJITA Tomonori [Mon, 25 Oct 2010 14:10:50 +0000 (16:10 +0200)]
[S390] enable ARCH_DMA_ADDR_T_64BIT with 64BIT

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] dasd: ignore unsolicited interrupts for DIAG
Stefan Haberland [Mon, 25 Oct 2010 14:10:49 +0000 (16:10 +0200)]
[S390] dasd: ignore unsolicited interrupts for DIAG

For the DASD DIAG discipline IO is started through special diagnose
calls. Unsolicited interrupts may contain information about the device
itself. But this information is not needed because the device is not
used directly.
Fix the case that an unimplemented dicipline function may be called
by ignoring unsolicited interrupts for the DIAG disciplin.

Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] kvm: Enable z196 instruction facilities
Christian Borntraeger [Mon, 25 Oct 2010 14:10:48 +0000 (16:10 +0200)]
[S390] kvm: Enable z196 instruction facilities

Enable PFPO, floating point extension, distinct-operands,
fast-BCR-serialization, high-word, interlocked-access, load/store-
on-condition, and population-count facilities for guests.
(bits 37, 44 and 45).

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] dasd: fix unsolicited interrupt recognition
Stefan Weinhuber [Mon, 25 Oct 2010 14:10:47 +0000 (16:10 +0200)]
[S390] dasd: fix unsolicited interrupt recognition

The dasd interrupt handler needs to distinguish solicited from
unsolicited interrupts, as unsolicited interrupts may require special
handling (e.g. summary unit checks) and solicited interrupts require
proper error recovery for the failed I/O request.
The interrupt handler needs to check several bit fields in the
interrupt response block (irb) to make this distinction.
So far our check of the status control bits has not been specific
enough, which may lead to a failed request getting just retried
instead of the necessary error recovery.

Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] dasd: fix use after free in dbf
Sebastian Ott [Mon, 25 Oct 2010 14:10:46 +0000 (16:10 +0200)]
[S390] dasd: fix use after free in dbf

Writing to /proc/dasd/statistics while the debug level of the
generic dasd debug entry is set to DBF_DEBUG will lead to an
use after free when accessing the debug entry later.
Since for the format string "%s" in the s390 dbf only a pointer
to the string is stored in the debug feature and the buffer used
here is freed afterwards.

To fix this just remove the debug message.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] kvm: Fix badness at include/asm/mmu_context.h:83
Christian Borntraeger [Mon, 25 Oct 2010 14:10:45 +0000 (16:10 +0200)]
[S390] kvm: Fix badness at include/asm/mmu_context.h:83

commit 050eef364ad700590a605a0749f825cab4834b1e
    [S390] fix tlb flushing vs. concurrent /proc accesses
broke KVM on s390x. On every schedule a
Badness at include/asm/mmu_context.h:83 appears. s390_enable_sie
replaces the mm on the __running__ task, therefore, we have to
increase the attach count of the new mm.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] cio: fix I/O cancel function
Peter Oberparleiter [Mon, 25 Oct 2010 14:10:44 +0000 (16:10 +0200)]
[S390] cio: fix I/O cancel function

Function ccw_device_cancel_halt_clear may cause an unexpected kernel
panic if a clear function is currently active at the subchannel for
which it is called. In that case, the iretry counter used to determine
the number of retries is never initialized, leading to an immediate
failure of the function which results in a kernel panic.

Fix this by initializing the iretry counter when the function is
first called. Also replace the kernel panic with a return code: a
single malfunctioning I/O device should not automatically cause a
system-wide kernel panic.

Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] topology: change default
Heiko Carstens [Mon, 25 Oct 2010 14:10:43 +0000 (16:10 +0200)]
[S390] topology: change default

Switch default value of the kernel parameter 'topology' from off to on.
Various performance measurements have finally shown that there are no
(known) regressions anywhere.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] smp: use correct cpu address in print_cpu_info()
Heiko Carstens [Mon, 25 Oct 2010 14:10:42 +0000 (16:10 +0200)]
[S390] smp: use correct cpu address in print_cpu_info()

Up to now print_cpu_info() uses the cpu address stored in it's local
lowcore to print a message to the console. The cpu address in the
lowcore is (in this case) however not the physical cpu address of the
local cpu. It's the address of the cpu that issued the sigp restart
which started the local cpu.
Fix this by using the store cpu address instruction instead.
It's not that anybody really cares since this is broken since more than
ten years...

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] remove ieee_instruction_pointer from thread_struct
Martin Schwidefsky [Mon, 25 Oct 2010 14:10:41 +0000 (16:10 +0200)]
[S390] remove ieee_instruction_pointer from thread_struct

The ieee_instruction_pointer can not be read from user space anymore
since git commit 613e1def6b52c399a8b72a5e11bc2e57d2546fb8, the ptrace
interface always returns zero. Remove it from the thread_struct. It
is still present in the user_regs_struct for compatability reasons.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] cleanup system call parameter setup
Martin Schwidefsky [Mon, 25 Oct 2010 14:10:40 +0000 (16:10 +0200)]
[S390] cleanup system call parameter setup

Do the setup of the stack overflow argument for the sixth system
call parameter right before the branch to the system call function.
That simplifies the system call parameter access code.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] correct alignment of cpuid structure
Martin Schwidefsky [Mon, 25 Oct 2010 14:10:39 +0000 (16:10 +0200)]
[S390] correct alignment of cpuid structure

The store-cpu-id instruction has a minimum alignment of 8. Reflect
that in the definition of struct cpuid.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] cleanup lowcore access from external interrupts
Martin Schwidefsky [Mon, 25 Oct 2010 14:10:38 +0000 (16:10 +0200)]
[S390] cleanup lowcore access from external interrupts

Read external interrupts parameters from the lowcore in the first
level interrupt handler in entry[64].S.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] cleanup lowcore access from program checks
Martin Schwidefsky [Mon, 25 Oct 2010 14:10:37 +0000 (16:10 +0200)]
[S390] cleanup lowcore access from program checks

Read all required fields for program checks from the lowcore in the
first level interrupt handler in entry[64].S. If the context that
caused the fault was enabled for interrupts we can now re-enable the
irqs in entry[64].S.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] pgtable: move pte_mkhuge() from hugetlb.h to pgtable.h
Heiko Carstens [Mon, 25 Oct 2010 14:10:36 +0000 (16:10 +0200)]
[S390] pgtable: move pte_mkhuge() from hugetlb.h to pgtable.h

All architectures besides s390 have pte_mkhuge() defined in pgtable.h.
So move the function to pgtable.h on s390 as well.
Fixes a compile error introduced with "hugetlb: hugepage migration core"
in linux-next which only happens on s390.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] fix SIGBUS handling
Martin Schwidefsky [Mon, 25 Oct 2010 14:10:35 +0000 (16:10 +0200)]
[S390] fix SIGBUS handling

Raise SIGBUS with a siginfo structure. Deliver BUS_ADRERR as si_code and
the address of the fault in the si_addr field.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] cio: notify drivers of channel path events
Sebastian Ott [Mon, 25 Oct 2010 14:10:34 +0000 (16:10 +0200)]
[S390] cio: notify drivers of channel path events

This patch adds a notification mechanism to inform ccw drivers
about changes to channel paths, which occured while the device
is online.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] css: update subchannel description after hibernate
Sebastian Ott [Mon, 25 Oct 2010 14:10:33 +0000 (16:10 +0200)]
[S390] css: update subchannel description after hibernate

Update the subchannel descriptor while resuming from hibernate
in order to obtain current link addresses.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] css: update descriptor after hibernate
Sebastian Ott [Mon, 25 Oct 2010 14:10:32 +0000 (16:10 +0200)]
[S390] css: update descriptor after hibernate

Update the channel path descriptors after hibernation.
This is done unlocked, since we are the only active
task at this time.

Note: chsc_determine_base_channel_path_desc is changed
to use spin_lock_irqsave, since it's called with
interrupts disabled in this case.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] cio: update descriptor in chsc_chp_vary
Sebastian Ott [Mon, 25 Oct 2010 14:10:31 +0000 (16:10 +0200)]
[S390] cio: update descriptor in chsc_chp_vary

Update the channel path descriptor at the beginning of to the
vary_on operation.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] chsc: use the global page to determine the chp desriptor
Sebastian Ott [Mon, 25 Oct 2010 14:10:30 +0000 (16:10 +0200)]
[S390] chsc: use the global page to determine the chp desriptor

chsc_determine_channel_path_desc is called by a wrapper
who allocates a response struct. The response data
is then memcpy'ed to this response struct by
chsc_determine_channel_path_desc.

Change chsc_determine_base_channel_path_desc to use the
global chsc_page and deliver it to the function doing
the actual chsc call. The channel path desriptor is
then directly read from the response data.

As a result we get rid of the additional allocation
for the response struct.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] chsc: consolidate memory allocations
Sebastian Ott [Mon, 25 Oct 2010 14:10:29 +0000 (16:10 +0200)]
[S390] chsc: consolidate memory allocations

Most wrappers around the channel subsystem call have their own logic
to allocate memory (with proper alignment) or use preallocated or
static memory. This patch converts most users of the channel
subsystem call to use the same preallocated page (proteced by a
spinlock).

Note: The sei_page which is used in our crw handler to call
"store event information" has to coexist, since
a) in crw context, while accessing the sei_page, sleeping is allowed
   (which will conflict with the spinlock protection of the chsc_page)
b) in crw context, while accessing the sei_page, channel subsystem
   calls are allowed (which itself would require the page).

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] chsc: initialization fixes
Sebastian Ott [Mon, 25 Oct 2010 14:10:28 +0000 (16:10 +0200)]
[S390] chsc: initialization fixes

This patch fixes:
 * kfree vs. free_page usage
 * structure definition for determine_css_characteristics
 * naming convention for the chsc init function
 * deregistration of crw handlers in the cleanup path

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] cio: add lock to struct channel_path
Sebastian Ott [Mon, 25 Oct 2010 14:10:27 +0000 (16:10 +0200)]
[S390] cio: add lock to struct channel_path

Serialize access to members of struct channel_path with a mutex.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] cio: fix memleak in resume path
Sebastian Ott [Mon, 25 Oct 2010 14:10:26 +0000 (16:10 +0200)]
[S390] cio: fix memleak in resume path

If a ccwdevice is lost during hibernation and a different
ccwdevice is attached to the same subchannel, we will
deregister the old ccw device and register the new one.

Since deregistration is not allowed in this context, we
handle this action later. However, some parts of the
registration process for the new device were started anyway,
so that the old device structure is no longer accessible.

Fix this by deferring both actions to the afterwards
scheduled subchannel event.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] cio: remove custom implementation of hex_to_bin()
Andy Shevchenko [Mon, 25 Oct 2010 14:10:25 +0000 (16:10 +0200)]
[S390] cio: remove custom implementation of hex_to_bin()

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] dasd fix dump_sense_dbf
Stefan Haberland [Mon, 25 Oct 2010 14:10:24 +0000 (16:10 +0200)]
[S390] dasd fix dump_sense_dbf

The dasd_eckd_dump_sense_dbf function uses a macro for s390 debug
feature that can handle up to 8 parameters (for the DASD device
driver).
Fix the function to use only the maximum number of parameters.

Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] css: fix sparse warning
Sebastian Ott [Mon, 25 Oct 2010 14:10:23 +0000 (16:10 +0200)]
[S390] css: fix sparse warning

fix this sparse warning:

drivers/s390/cio/css.c:580:6: warning: symbol 'css_schedule_eval_all_unreg'
was not declared. Should it be static?

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] hvc_iucv: do not call iucv_unregister if iucv_register failed
Hendrik Brueckner [Mon, 25 Oct 2010 14:10:22 +0000 (16:10 +0200)]
[S390] hvc_iucv: do not call iucv_unregister if iucv_register failed

If the iucv_register() functions fails, the error recovery calls
iucv_unregister() which might cause the following stack backtrace:

(<0000000000100ab2> show_trace+0xee/0x144)
<00000000004f1842> panic+0xb6/0x248
<00000000001010a6> die+0x15a/0x16c
<000000000011d936> do_no_context+0xa6/0xe4
<00000000004f84dc> do_protection_exception+0x2e8/0x3a4
<0000000000113afc> pgm_exit+0x0/0x14
<00000000004e786e> iucv_unregister+0x5a/0x17c
(<00000000004e785e> iucv_unregister+0x4a/0x17c)
<000000000076de74> hvc_iucv_init+0x228/0x5dc
<00000000001000c2> do_one_initcall+0x3e/0x19c
<00000000007524a2> kernel_init+0x28e/0x404
<0000000000105dd6> kernel_thread_starter+0x6/0xc
<0000000000105dd0> kernel_thread_starter+0x0/0xc

Remove the call to iucv_unregister() and remove the goto label
as unregistering is the last step in the hvc_iucv initialization.
If iucv_register() fails, simply clean up hvc terminals and free
resources.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
14 years ago[S390] cmm: fix crash on case conversion
Heiko Carstens [Mon, 25 Oct 2010 14:10:21 +0000 (16:10 +0200)]
[S390] cmm: fix crash on case conversion

When the cmm module is compiled into the kernel it will crash when
writing to the R/O data section.
Reason is the lower to upper case conversion of the "sender" module
parameter which ignored the fact that the pointer is preinitialized.

Introduced with 41b42876 "cmm, smsgiucv_app: convert sender to
uppercase"

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>