openwrt/staging/blogic.git
10 years agoNFSD: Adds macro EX_UUID_LEN for exports uuid's length
Kinglong Mee [Fri, 23 May 2014 12:00:19 +0000 (20:00 +0800)]
NFSD: Adds macro EX_UUID_LEN for exports uuid's length

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Helper function for parsing uuid
Kinglong Mee [Fri, 23 May 2014 11:59:06 +0000 (19:59 +0800)]
NFSD: Helper function for parsing uuid

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFS4: Avoid NULL reference or double free in nfsd4_fslocs_free()
Kinglong Mee [Fri, 23 May 2014 11:57:49 +0000 (19:57 +0800)]
NFS4: Avoid NULL reference or double free in nfsd4_fslocs_free()

If fsloc_parse() failed at kzalloc(), fs/nfsd/export.c
 411
 412         fsloc->locations = kzalloc(fsloc->locations_count
 413                         * sizeof(struct nfsd4_fs_location), GFP_KERNEL);
 414         if (!fsloc->locations)
 415                 return -ENOMEM;

svc_export_parse() will call nfsd4_fslocs_free() with fsloc->locations = NULL,
so that, "kfree(fsloc->locations[i].path);" will cause a crash.

If fsloc_parse() failed after that, fsloc_parse() will call nfsd4_fslocs_free(),
and svc_export_parse() will call it again, so that, a double free is caused.

This patch checks the fsloc->locations, and set to NULL after it be freed.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: better reservation of head space for krb5
J. Bruce Fields [Mon, 12 May 2014 22:10:58 +0000 (18:10 -0400)]
nfsd4: better reservation of head space for krb5

RPC_MAX_AUTH_SIZE is scattered around several places.  Better to set it
once in the auth code, where this kind of estimate should be made.  And
while we're at it we can leave it zero when we're not using krb5i or
krb5p.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: kill write32, write64
J. Bruce Fields [Sat, 22 Mar 2014 22:48:39 +0000 (18:48 -0400)]
nfsd4: kill write32, write64

And switch a couple other functions from the encode(&p,...) convention
to the p = encode(p,...) convention mostly used elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: kill WRITEMEM
J. Bruce Fields [Sat, 22 Mar 2014 22:43:16 +0000 (18:43 -0400)]
nfsd4: kill WRITEMEM

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: kill WRITE64
J. Bruce Fields [Sat, 22 Mar 2014 21:11:35 +0000 (17:11 -0400)]
nfsd4: kill WRITE64

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: kill WRITE32
J. Bruce Fields [Sat, 22 Mar 2014 21:09:18 +0000 (17:09 -0400)]
nfsd4: kill WRITE32

These macros just obscure what's going on.  Adopt the convention of the
client-side code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: really fix nfs4err_resource in 4.1 case
J. Bruce Fields [Thu, 8 May 2014 21:38:18 +0000 (17:38 -0400)]
nfsd4: really fix nfs4err_resource in 4.1 case

encode_getattr, for example, can return nfserr_resource to indicate it
ran out of buffer space.  That's not a legal error in the 4.1 case.
And in the 4.1 case, if we ran out of buffer space, we should have
exceeded a session limit too.

(Note in 1bc49d83c37cfaf46be357757e592711e67f9809 "nfsd4: fix
nfs4err_resource in 4.1 case" we originally tried fixing this error
return before fixing the problem that we could error out while we still
had lots of available space.  The result was to trade one illegal error
for another in those cases.  We decided that was helpful, so reverted
the change in fc208d026be0c7d60db9118583fc62f6ca97743d, and are only
reinstating it now that we've elimited almost all of those cases.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: allow exotic read compounds
J. Bruce Fields [Tue, 18 Mar 2014 21:44:10 +0000 (17:44 -0400)]
nfsd4: allow exotic read compounds

I'm not sure why a client would want to stuff multiple reads in a
single compound rpc, but it's legal for them to do it, and we should
really support it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: more read encoding cleanup
J. Bruce Fields [Tue, 13 May 2014 21:27:03 +0000 (17:27 -0400)]
nfsd4: more read encoding cleanup

More cleanup, no change in functionality.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: read encoding cleanup
J. Bruce Fields [Tue, 13 May 2014 20:32:04 +0000 (16:32 -0400)]
nfsd4: read encoding cleanup

Trivial cleanup, no change in functionality.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: separate splice and readv cases
J. Bruce Fields [Tue, 18 Mar 2014 21:01:51 +0000 (17:01 -0400)]
nfsd4: separate splice and readv cases

The splice and readv cases are actually quite different--for example the
former case ignores the array of vectors we build up for the latter.

It is probably clearer to separate the two cases entirely.

There's some code duplication between the split out encoders, but this
is only temporary and will be fixed by a later patch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: nfsd_vfs_read doesn't use file handle parameter
J. Bruce Fields [Fri, 21 Mar 2014 01:32:04 +0000 (21:32 -0400)]
nfsd4: nfsd_vfs_read  doesn't use file handle parameter

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: turn off zero-copy-read in exotic cases
J. Bruce Fields [Tue, 4 Feb 2014 15:36:59 +0000 (10:36 -0500)]
nfsd4: turn off zero-copy-read in exotic cases

We currently allow only one read per compound, with operations before
and after whose responses will require no more than about a page to
encode.

While we don't expect clients to violate those limits any time soon,
this limitation isn't really condoned by the spec, so to future proof
the server we should lift the limitation.

At the same time we'd like to continue to support zero-copy reads.

Supporting multiple zero-copy-reads per compound would require a new
data structure to replace struct xdr_buf, which can represent only one
set of included pages.

So for now we plan to modify encode_read() to support either zero-copy
or non-zero-copy reads, and use some heuristics at the start of the
compound processing to decide whether a zero-copy read will work.

This will allow us to support more exotic compounds without introducing
a performance regression in the normal case.

Later patches handle those "exotic compounds", this one just makes sure
zero-copy is turned off in those cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: estimate sequence response size
J. Bruce Fields [Sun, 23 Mar 2014 16:34:22 +0000 (09:34 -0700)]
nfsd4: estimate sequence response size

Otherwise a following patch would turn off all 4.1 zero-copy reads.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: better estimate of getattr response size
J. Bruce Fields [Sun, 23 Mar 2014 16:01:48 +0000 (12:01 -0400)]
nfsd4: better estimate of getattr response size

We plan to use this estimate to decide whether or not to allow zero-copy
reads.  Currently we're assuming all getattr's are a page, which can be
both too small (ACLs e.g. may be arbitrarily long) and too large (after
an upcoming read patch this will unnecessarily prevent zero copy reads
in any read compound also containing a getattr).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: don't treat readlink like a zero-copy operation
J. Bruce Fields [Mon, 20 Jan 2014 22:08:27 +0000 (17:08 -0500)]
nfsd4: don't treat readlink like a zero-copy operation

There's no advantage to this zero-copy-style readlink encoding, and it
unnecessarily limits the kinds of compounds we can handle.  (In practice
I can't see why a client would want e.g. multiple readlink calls in a
comound, but it's probably a spec violation for us not to handle it.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: enforce rd_dircount
J. Bruce Fields [Fri, 21 Mar 2014 01:20:26 +0000 (21:20 -0400)]
nfsd4: enforce rd_dircount

As long as we're here, let's enforce the protocol's limit on the number
of directory entries to return in a readdir.

I don't think anyone's ever noticed our lack of enforcement, but maybe
there's more of a chance they will now that we allow larger readdirs.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: allow large readdirs
J. Bruce Fields [Mon, 20 Jan 2014 21:37:11 +0000 (16:37 -0500)]
nfsd4: allow large readdirs

Currently we limit readdir results to a single page.  This can result in
a performance regression compared to NFSv3 when reading large
directories.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: use session limits to release send buffer reservation
J. Bruce Fields [Fri, 21 Mar 2014 00:47:41 +0000 (20:47 -0400)]
nfsd4: use session limits to release send buffer reservation

Once we know the limits the session places on the size of the rpc, we
can also use that information to release any unnecessary reserved reply
buffer space.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: adjust buflen to session channel limit
J. Bruce Fields [Thu, 13 Mar 2014 01:39:35 +0000 (21:39 -0400)]
nfsd4: adjust buflen to session channel limit

We can simplify session limit enforcement by restricting the xdr buflen
to the session size.

Also fix a preexisting bug: we should really have been taking into
account the auth-required space when comparing against session limits,
which are limits on the size of the entire rpc reply, including any krb5
overhead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agorpc: define xdr_restrict_buflen
J. Bruce Fields [Thu, 6 Mar 2014 18:22:18 +0000 (13:22 -0500)]
rpc: define xdr_restrict_buflen

With this xdr_reserve_space can help us enforce various limits.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: fix buflen calculation after read encoding
J. Bruce Fields [Mon, 19 May 2014 20:18:23 +0000 (16:18 -0400)]
nfsd4: fix buflen calculation after read encoding

We don't necessarily want to assume that the buflen is the same
as the number of bytes available in the pages.  We may have some reason
to set it to something less (for example, later patches will use a
smaller buflen to enforce session limits).

So, calculate the buflen relative to the previous buflen instead of
recalculating it from scratch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: nfsd4_check_resp_size should check against whole buffer
J. Bruce Fields [Tue, 11 Mar 2014 21:58:57 +0000 (17:58 -0400)]
nfsd4: nfsd4_check_resp_size should check against whole buffer

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: minor encode_read cleanup
J. Bruce Fields [Tue, 11 Mar 2014 20:51:23 +0000 (16:51 -0400)]
nfsd4: minor encode_read cleanup

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: more precise nfsd4_max_reply
J. Bruce Fields [Tue, 11 Mar 2014 19:39:13 +0000 (15:39 -0400)]
nfsd4: more precise nfsd4_max_reply

It will turn out to be useful to have a more accurate estimate of reply
size; so, piggyback on the existing op reply-size estimators.

Also move nfsd4_max_reply to nfs4proc.c to get easier access to struct
nfsd4_operation and friends.  (Thanks to Christoph Hellwig for pointing
out that simplification.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: don't try to encode conflicting owner if low on space
J. Bruce Fields [Mon, 10 Mar 2014 16:19:10 +0000 (12:19 -0400)]
nfsd4: don't try to encode conflicting owner if low on space

I ran into this corner case in testing: in theory clients can provide
state owners up to 1024 bytes long.  In the sessions case there might be
a risk of this pushing us over the DRC slot size.

The conflicting owner isn't really that important, so let's humor a
client that provides a small maxresponsize_cached by allowing ourselves
to return without the conflicting owner instead of outright failing the
operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: convert 4.1 replay encoding
J. Bruce Fields [Fri, 21 Mar 2014 21:57:57 +0000 (17:57 -0400)]
nfsd4: convert 4.1 replay encoding

Limits on maxresp_sz mean that we only ever need to replay rpc's that
are contained entirely in the head.

The one exception is very small zero-copy reads.  That's an odd corner
case as clients wouldn't normally ask those to be cached.

in any case, this seems a little more robust.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: allow encoding across page boundaries
J. Bruce Fields [Mon, 26 Aug 2013 20:04:46 +0000 (16:04 -0400)]
nfsd4: allow encoding across page boundaries

After this we can handle for example getattr of very large ACLs.

Read, readdir, readlink are still special cases with their own limits.

Also we can't handle a new operation starting close to the end of a
page.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: size-checking cleanup
J. Bruce Fields [Tue, 11 Mar 2014 21:54:34 +0000 (17:54 -0400)]
nfsd4: size-checking cleanup

Better variable name, some comments, etc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: remove redundant encode buffer size checking
J. Bruce Fields [Sat, 8 Mar 2014 22:19:55 +0000 (17:19 -0500)]
nfsd4: remove redundant encode buffer size checking

Now that all op encoders can handle running out of space, we no longer
need to check the remaining size for every operation; only nonidempotent
operations need that check, and that can be done by
nfsd4_check_resp_size.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: nfsd4_check_resp_size needn't recalculate length
J. Bruce Fields [Sat, 8 Mar 2014 21:42:05 +0000 (16:42 -0500)]
nfsd4: nfsd4_check_resp_size needn't recalculate length

We're keeping the length updated as we go now, so there's no need for
the extra calculation here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: reserve space before inlining 0-copy pages
J. Bruce Fields [Sat, 22 Mar 2014 19:15:11 +0000 (15:15 -0400)]
nfsd4: reserve space before inlining 0-copy pages

Once we've included page-cache pages in the encoding it's difficult to
remove them and restart encoding.  (xdr_truncate_encode doesn't handle
that case.)  So, make sure we'll have adequate space to finish the
operation first.

For now COMPOUND_SLACK_SPACE checks should prevent this case happening,
but we want to remove those checks.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: teach encoders to handle reserve_space failures
J. Bruce Fields [Thu, 30 Jan 2014 22:18:38 +0000 (17:18 -0500)]
nfsd4: teach encoders to handle reserve_space failures

We've tried to prevent running out of space with COMPOUND_SLACK_SPACE
and special checking in those operations (getattr) whose result can vary
enormously.

However:
- COMPOUND_SLACK_SPACE may be difficult to maintain as we add
  more protocol.
- BUG_ON or page faulting on failure seems overly fragile.
- Especially in the 4.1 case, we prefer not to fail compounds
  just because the returned result came *close* to session
  limits.  (Though perfect enforcement here may be difficult.)
- I'd prefer encoding to be uniform for all encoders instead of
  having special exceptions for encoders containing, for
  example, attributes.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: "backfill" using write_bytes_to_xdr_buf
J. Bruce Fields [Thu, 29 Aug 2013 19:42:52 +0000 (15:42 -0400)]
nfsd4: "backfill" using write_bytes_to_xdr_buf

Normally xdr encoding proceeds in a single pass from start of a buffer
to end, but sometimes we have to write a few bytes to an earlier
position.

Use write_bytes_to_xdr_buf for these cases rather than saving a pointer
to write to.  We plan to rewrite xdr_reserve_space to handle encoding
across page boundaries using a scratch buffer, and don't want to risk
writing to a pointer that was contained in a scratch buffer.

Also it will no longer be safe to calculate lengths by subtracting two
pointers, so use xdr_buf offsets instead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: use xdr_truncate_encode
J. Bruce Fields [Thu, 27 Feb 2014 01:17:02 +0000 (20:17 -0500)]
nfsd4: use xdr_truncate_encode

Now that lengths are reliable, we can use xdr_truncate instead of
open-coding it everywhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agorpc: xdr_truncate_encode
J. Bruce Fields [Tue, 25 Feb 2014 22:44:21 +0000 (17:44 -0500)]
rpc: xdr_truncate_encode

This will be used in the server side in a few cases:
- when certain operations (read, readdir, readlink) fail after
  encoding a partial response.
- when we run out of space after encoding a partial response.
- in readlink, where we initially reserve PAGE_SIZE bytes for
  data, then truncate to the actual size.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: keep xdr buf length updated
J. Bruce Fields [Wed, 26 Feb 2014 22:39:35 +0000 (17:39 -0500)]
nfsd4: keep xdr buf length updated

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: no need for encode_compoundres to adjust lengths
J. Bruce Fields [Wed, 26 Feb 2014 22:16:27 +0000 (17:16 -0500)]
nfsd4: no need for encode_compoundres to adjust lengths

xdr_reserve_space should now be calculating the length correctly as we
go, so there's no longer any need to fix it up here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: remove ADJUST_ARGS
J. Bruce Fields [Fri, 31 Jan 2014 16:19:41 +0000 (11:19 -0500)]
nfsd4: remove ADJUST_ARGS

It's just uninteresting debugging code at this point.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: use xdr_stream throughout compound encoding
J. Bruce Fields [Wed, 26 Feb 2014 22:00:38 +0000 (17:00 -0500)]
nfsd4: use xdr_stream throughout compound encoding

Note this makes ADJUST_ARGS useless; we'll remove it in the following
patch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: use xdr_reserve_space in attribute encoding
J. Bruce Fields [Wed, 28 Aug 2013 01:32:25 +0000 (21:32 -0400)]
nfsd4: use xdr_reserve_space in attribute encoding

This is a cosmetic change for now; no change in behavior.

Note we're just depending on xdr_reserve_space to do the bounds checking
for us, we're not really depending on its adjustment of iovec or xdr_buf
lengths yet, as those are fixed up by as necessary after the fact by
read-link operations and by nfs4svc_encode_compoundres.  However we do
have to update xdr->iov on read-like operations to prevent
xdr_reserve_space from messing with the already-fixed-up length of the
the head.

When the attribute encoding fails partway through we have to undo the
length adjustments made so far.  We do it manually for now, but later
patches will add an xdr_truncate_encode() helper to handle cases like
this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: allow space for final error return
J. Bruce Fields [Fri, 7 Mar 2014 16:01:37 +0000 (11:01 -0500)]
nfsd4: allow space for final error return

This post-encoding check should be taking into account the need to
encode at least an out-of-space error to the following op (if any).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: fix encoding of out-of-space replies
J. Bruce Fields [Fri, 7 Mar 2014 01:39:29 +0000 (20:39 -0500)]
nfsd4: fix encoding of out-of-space replies

If nfsd4_check_resp_size() returns an error then we should really be
truncating the reply here, otherwise we may leave extra garbage at the
end of the rpc reply.

Also add a warning to catch any cases where our reply-size estimates may
be wrong in the case of a non-idempotent operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: reserve head space for krb5 integ/priv info
J. Bruce Fields [Tue, 21 Jan 2014 16:06:52 +0000 (11:06 -0500)]
nfsd4: reserve head space for krb5 integ/priv info

Currently if the nfs-level part of a reply would be too large, we'll
return an error to the client.  But if the nfs-level part fits and
leaves no room for krb5p or krb5i stuff, then we just drop the request
entirely.

That's no good.  Instead, reserve some slack space at the end of the
buffer and make sure we fail outright if we'd come close.

The slack space here is a massive overstimate of what's required, we
should probably try for a tighter limit at some point.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: move proc_compound xdr encode init to helper
J. Bruce Fields [Wed, 15 Jan 2014 21:26:42 +0000 (16:26 -0500)]
nfsd4: move proc_compound xdr encode init to helper

Mechanical transformation with no change of behavior.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: tweak nfsd4_encode_getattr to take xdr_stream
J. Bruce Fields [Mon, 26 Aug 2013 20:04:46 +0000 (16:04 -0400)]
nfsd4: tweak nfsd4_encode_getattr to take xdr_stream

Just change the nfsd4_encode_getattr api.  Not changing any code or
adding any new functionality yet.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: embed xdr_stream in nfsd4_compoundres
J. Bruce Fields [Wed, 15 Jan 2014 20:17:58 +0000 (15:17 -0500)]
nfsd4: embed xdr_stream in nfsd4_compoundres

This is a mechanical transformation with no change in behavior.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: decoding errors can still be cached and require space
J. Bruce Fields [Mon, 19 May 2014 16:27:11 +0000 (12:27 -0400)]
nfsd4: decoding errors can still be cached and require space

Currently a non-idempotent op reply may be cached if it fails in the
proc code but not if it fails at xdr decoding.  I doubt there are any
xdr-decoding-time errors that would make this a problem in practice, so
this probably isn't a serious bug.

The space estimates should also take into account space required for
encoding of error returns.  Again, not a practical problem, though it
would become one after future patches which will tighten the space
estimates.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: fix write reply size estimate
J. Bruce Fields [Sat, 17 May 2014 01:24:56 +0000 (21:24 -0400)]
nfsd4: fix write reply size estimate

The write reply also includes count and stable_how.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: read size estimate should include padding
J. Bruce Fields [Fri, 16 May 2014 18:38:20 +0000 (14:38 -0400)]
nfsd4: read size estimate should include padding

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: allow larger 4.1 session drc slots
J. Bruce Fields [Wed, 12 Mar 2014 19:17:18 +0000 (15:17 -0400)]
nfsd4: allow larger 4.1 session drc slots

The client is actually asking for 2532 bytes.  I suspect that's a
mistake.  But maybe we can allow some more.  In theory lock needs more
if it might return a maximum-length lockowner in the denied case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: READ, READDIR, etc., are idempotent
J. Bruce Fields [Fri, 7 Mar 2014 16:43:58 +0000 (11:43 -0500)]
nfsd4: READ, READDIR, etc., are idempotent

OP_MODIFIES_SOMETHING flags operations that we should be careful not to
initiate without being sure we have the buffer space to encode a reply.

None of these ops fall into that category.

We could probably remove a few more, but this isn't a very important
problem at least for ops whose reply size is easy to estimate.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Only set PF_LESS_THROTTLE when really needed.
NeilBrown [Mon, 12 May 2014 01:22:47 +0000 (11:22 +1000)]
nfsd: Only set PF_LESS_THROTTLE when really needed.

PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks
and live-locks while writing to the page cache in a loop-back
NFS mount situation.

It therefore makes sense to *only* set PF_LESS_THROTTLE in this
situation.
We now know when a request came from the local-host so it could be a
loop-back mount.  We already know when we are handling write requests,
and when we are doing anything else.

So combine those two to allow nfsd to still be throttled (like any
other process) in every situation except when it is known to be
problematic.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoSUNRPC: track whether a request is coming from a loop-back interface.
NeilBrown [Mon, 12 May 2014 01:22:47 +0000 (11:22 +1000)]
SUNRPC: track whether a request is coming from a loop-back interface.

If an incoming NFS request is coming from the local host, then
nfsd will need to perform some special handling.  So detect that
possibility and make the source visible in rq_local.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoSUNRPC: Fix a module reference leak in svc_handle_xprt
Trond Myklebust [Sun, 18 May 2014 18:05:22 +0000 (14:05 -0400)]
SUNRPC: Fix a module reference leak in svc_handle_xprt

If the accept() call fails, we need to put the module reference.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Ignore client's source port on RDMA transports
Chuck Lever [Mon, 19 May 2014 17:40:22 +0000 (13:40 -0400)]
NFSD: Ignore client's source port on RDMA transports

An NFS/RDMA client's source port is meaningless for RDMA transports.
The transport layer typically sets the source port value on the
connection to a random ephemeral port.

Currently, NFS server administrators must specify the "insecure"
export option to enable clients to access exports via RDMA.

But this means NFS clients can access such an export via IP using an
ephemeral port, which may not be desirable.

This patch eliminates the need to specify the "insecure" export
option to allow NFS/RDMA clients access to an export.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=250
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: remove nfsd4_free_slab
Christoph Hellwig [Wed, 21 May 2014 14:43:03 +0000 (07:43 -0700)]
nfsd: remove nfsd4_free_slab

No need for a kmem_cache_destroy wrapper in nfsd, just do proper
goto based unwinding.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: Remove assignments inside conditions
Benoit Taine [Thu, 22 May 2014 14:32:30 +0000 (16:32 +0200)]
nfsd: Remove assignments inside conditions

Assignments should not happen inside an if conditional, but in the line
before. This issue was reported by checkpatch.

The semantic patch that makes this change is as follows
(http://coccinelle.lip6.fr/):

// <smpl>

@@
identifier i1;
expression e1;
statement S;
@@
-if(!(i1 = e1)) S
+i1 = e1;
+if(!i1)
+S

// </smpl>

It has been tested by compilation.

Signed-off-by: Benoit Taine <benoit.taine@lip6.fr>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoMerge 3.15 bugfixes for 3.16
J. Bruce Fields [Thu, 22 May 2014 19:48:11 +0000 (15:48 -0400)]
Merge 3.15 bugfixes for 3.16

10 years agonfsd4: fix delegation cleanup on error
J. Bruce Fields [Mon, 3 Mar 2014 17:19:18 +0000 (12:19 -0500)]
nfsd4: fix delegation cleanup on error

We're not cleaning up everything we need to on error.  In particular,
we're not removing our lease.  Among other problems this can cause the
struct nfs4_file used as fl_owner to be referenced after it has been
destroyed.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Don't clear SUID/SGID after root writing data
Kinglong Mee [Fri, 18 Apr 2014 16:17:31 +0000 (00:17 +0800)]
NFSD: Don't clear SUID/SGID after root writing data

We're clearing the SUID/SGID bits on write by hand in nfsd_vfs_write,
even though the subsequent vfs_writev() call will end up doing this for
us (through file system write methods eventually calling
file_remove_suid(), e.g., from __generic_file_aio_write).

So, remove the redundant nfsd code.

The only change in behavior is when the write is by root, in which case
we previously cleared SUID/SGID, but will now leave it alone.  The new
behavior is the behavior of every filesystem we've checked.

It seems better to be consistent with local filesystem behavior.  And
the security advantage seems limited as root could always restore these
bits by hand if it wanted.

SUID/SGID is not cleared after writing data with (root, local ext4),
   File: ‘test’
   Size: 0               Blocks: 0          IO Block: 4096   regular
empty file
Device: 803h/2051d      Inode: 1200137     Links: 1
Access: (4777/-rwsrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:admin_home_t:s0
Access: 2014-04-18 21:36:31.016029014 +0800
Modify: 2014-04-18 21:36:31.016029014 +0800
Change: 2014-04-18 21:36:31.026030285 +0800
  Birth: -
   File: ‘test’
   Size: 5               Blocks: 8          IO Block: 4096   regular file
Device: 803h/2051d      Inode: 1200137     Links: 1
Access: (4777/-rwsrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:admin_home_t:s0
Access: 2014-04-18 21:36:31.016029014 +0800
Modify: 2014-04-18 21:36:31.040032065 +0800
Change: 2014-04-18 21:36:31.040032065 +0800
  Birth: -

With no_root_squash, (root, remote ext4), SUID/SGID are cleared,
   File: ‘test’
   Size: 0               Blocks: 0          IO Block: 262144 regular
empty file
Device: 24h/36d Inode: 786439      Links: 1
Access: (4777/-rwsrwxrwx)  Uid: ( 1000/    test)   Gid: ( 1000/    test)
Context: system_u:object_r:nfs_t:s0
Access: 2014-04-18 21:45:32.155805097 +0800
Modify: 2014-04-18 21:45:32.155805097 +0800
Change: 2014-04-18 21:45:32.168806749 +0800
  Birth: -
   File: ‘test’
   Size: 5               Blocks: 8          IO Block: 262144 regular file
Device: 24h/36d Inode: 786439      Links: 1
Access: (0777/-rwxrwxrwx)  Uid: ( 1000/    test)   Gid: ( 1000/    test)
Context: system_u:object_r:nfs_t:s0
Access: 2014-04-18 21:45:32.155805097 +0800
Modify: 2014-04-18 21:45:32.184808783 +0800
Change: 2014-04-18 21:45:32.184808783 +0800
  Birth: -

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: warn on finding lockowner without stateid's
J. Bruce Fields [Thu, 8 May 2014 15:19:41 +0000 (11:19 -0400)]
nfsd4: warn on finding lockowner without stateid's

The current code assumes a one-to-one lockowner<->lock stateid
correspondance.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: remove lockowner when removing lock stateid
J. Bruce Fields [Tue, 20 May 2014 19:55:21 +0000 (15:55 -0400)]
nfsd4: remove lockowner when removing lock stateid

The nfsv4 state code has always assumed a one-to-one correspondance
between lock stateid's and lockowners even if it appears not to in some
places.

We may actually change that, but for now when FREE_STATEID releases a
lock stateid it also needs to release the parent lockowner.

Symptoms were a subsequent LOCK crashing in find_lockowner_str when it
calls same_lockowner_ino on a lockowner that unexpectedly has an empty
so_stateids list.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd4: fix corruption on setting an ACL.
J. Bruce Fields [Thu, 15 May 2014 01:57:26 +0000 (21:57 -0400)]
nfsd4: fix corruption on setting an ACL.

As of 06f9cc12caa862f5bc86ebdb4f77568a4bef0167 "nfsd4: don't create
unnecessary mask acl", any non-trivial ACL will be left with an
unitialized entry, and a trivial ACL may write one entry beyond what's
allocated.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Get rid of empty function nfs4_state_init
Kinglong Mee [Tue, 8 Apr 2014 05:06:28 +0000 (13:06 +0800)]
NFSD: Get rid of empty function nfs4_state_init

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Use simple_read_from_buffer for coping data to userspace
Kinglong Mee [Tue, 8 Apr 2014 05:04:01 +0000 (13:04 +0800)]
NFSD: Use simple_read_from_buffer for coping data to userspace

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoSUNRPC: Fix printk that is not only for nfsd
Kinglong Mee [Tue, 15 Apr 2014 09:13:56 +0000 (17:13 +0800)]
SUNRPC: Fix printk that is not only for nfsd

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoMerge 3.15 bugfix for 3.16
J. Bruce Fields [Thu, 8 May 2014 18:58:42 +0000 (14:58 -0400)]
Merge 3.15 bugfix for 3.16

10 years agonfsd: clean up fh_auth usage
Christoph Hellwig [Wed, 7 May 2014 11:49:44 +0000 (13:49 +0200)]
nfsd: clean up fh_auth usage

Use fh_fsid when reffering to the fsid part of the filehandle.  The
variable length auth field envisioned in nfsfh wasn't ever implemented.
Also clean up some lose ends around this and document the file handle
format better.

Btw, why do we even export nfsfh.h to userspace?  The file handle very
much is kernel private, and nothing in nfs-utils include the header
either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: cleanup unneeded including linux/export.h
Kinglong Mee [Wed, 7 May 2014 15:08:04 +0000 (23:08 +0800)]
NFSD: cleanup unneeded including linux/export.h

commit 4ac7249ea5a0ceef9f8269f63f33cc873c3fac61 have remove all EXPORT_SYMBOL,
linux/export.h is not needed, just clean it.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSD: Call ->set_acl with a NULL ACL structure if no entries
Kinglong Mee [Fri, 18 Apr 2014 12:49:04 +0000 (20:49 +0800)]
NFSD: Call ->set_acl with a NULL ACL structure if no entries

After setting ACL for directory, I got two problems that caused
by the cached zero-length default posix acl.

This patch make sure nfsd4_set_nfs4_acl calls ->set_acl
with a NULL ACL structure if there are no entries.

Thanks for Christoph Hellwig's advice.

First problem:
............ hang ...........

Second problem:
[ 1610.167668] ------------[ cut here ]------------
[ 1610.168320] kernel BUG at /root/nfs/linux/fs/nfsd/nfs4acl.c:239!
[ 1610.168320] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1610.168320] Modules linked in: nfsv4(OE) nfs(OE) nfsd(OE)
rpcsec_gss_krb5 fscache ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack
rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
auth_rpcgss nfs_acl snd_intel8x0 ppdev lockd snd_ac97_codec ac97_bus
snd_pcm snd_timer e1000 pcspkr parport_pc snd parport serio_raw joydev
i2c_piix4 sunrpc(OE) microcode soundcore i2c_core ata_generic pata_acpi
[last unloaded: nfsd]
[ 1610.168320] CPU: 0 PID: 27397 Comm: nfsd Tainted: G           OE
3.15.0-rc1+ #15
[ 1610.168320] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 1610.168320] task: ffff88005ab653d0 ti: ffff88005a944000 task.ti:
ffff88005a944000
[ 1610.168320] RIP: 0010:[<ffffffffa034d5ed>]  [<ffffffffa034d5ed>]
_posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd]
[ 1610.168320] RSP: 0018:ffff88005a945b00  EFLAGS: 00010293
[ 1610.168320] RAX: 0000000000000001 RBX: ffff88006700bac0 RCX:
0000000000000000
[ 1610.168320] RDX: 0000000000000000 RSI: ffff880067c83f00 RDI:
ffff880068233300
[ 1610.168320] RBP: ffff88005a945b48 R08: ffffffff81c64830 R09:
0000000000000000
[ 1610.168320] R10: ffff88004ea85be0 R11: 000000000000f475 R12:
ffff880068233300
[ 1610.168320] R13: 0000000000000003 R14: 0000000000000002 R15:
ffff880068233300
[ 1610.168320] FS:  0000000000000000(0000) GS:ffff880077800000(0000)
knlGS:0000000000000000
[ 1610.168320] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1610.168320] CR2: 00007f5bcbd3b0b9 CR3: 0000000001c0f000 CR4:
00000000000006f0
[ 1610.168320] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1610.168320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1610.168320] Stack:
[ 1610.168320]  ffffffff00000000 0000000b67c83500 000000076700bac0
0000000000000000
[ 1610.168320]  ffff88006700bac0 ffff880068233300 ffff88005a945c08
0000000000000002
[ 1610.168320]  0000000000000000 ffff88005a945b88 ffffffffa034e2d5
000000065a945b68
[ 1610.168320] Call Trace:
[ 1610.168320]  [<ffffffffa034e2d5>] nfsd4_get_nfs4_acl+0x95/0x150 [nfsd]
[ 1610.168320]  [<ffffffffa03400d6>] nfsd4_encode_fattr+0x646/0x1e70 [nfsd]
[ 1610.168320]  [<ffffffff816a6e6e>] ? kmemleak_alloc+0x4e/0xb0
[ 1610.168320]  [<ffffffffa0327962>] ?
nfsd_setuser_and_check_port+0x52/0x80 [nfsd]
[ 1610.168320]  [<ffffffff812cd4bb>] ? selinux_cred_prepare+0x1b/0x30
[ 1610.168320]  [<ffffffffa0341caa>] nfsd4_encode_getattr+0x5a/0x60 [nfsd]
[ 1610.168320]  [<ffffffffa0341e07>] nfsd4_encode_operation+0x67/0x110
[nfsd]
[ 1610.168320]  [<ffffffffa033844d>] nfsd4_proc_compound+0x21d/0x810 [nfsd]
[ 1610.168320]  [<ffffffffa0324d9b>] nfsd_dispatch+0xbb/0x200 [nfsd]
[ 1610.168320]  [<ffffffffa00850cd>] svc_process_common+0x46d/0x6d0 [sunrpc]
[ 1610.168320]  [<ffffffffa0085433>] svc_process+0x103/0x170 [sunrpc]
[ 1610.168320]  [<ffffffffa032472f>] nfsd+0xbf/0x130 [nfsd]
[ 1610.168320]  [<ffffffffa0324670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1610.168320]  [<ffffffff810a5202>] kthread+0xd2/0xf0
[ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320]  [<ffffffff816c1ebc>] ret_from_fork+0x7c/0xb0
[ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320] Code: 78 02 e9 e7 fc ff ff 31 c0 31 d2 31 c9 66 89 45 ce
41 8b 04 24 66 89 55 d0 66 89 4d d2 48 8d 04 80 49 8d 5c 84 04 e9 37 fd
ff ff <0f> 0b 90 0f 1f 44 00 00 55 8b 56 08 c7 07 00 00 00 00 8b 46 0c
[ 1610.168320] RIP  [<ffffffffa034d5ed>] _posix_to_nfsv4_one+0x3cd/0x3d0
[nfsd]
[ 1610.168320]  RSP <ffff88005a945b00>
[ 1610.257313] ---[ end trace 838254e3e352285b ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSd: Clean up nfs4_preprocess_stateid_op
Trond Myklebust [Fri, 18 Apr 2014 18:44:07 +0000 (14:44 -0400)]
NFSd: Clean up nfs4_preprocess_stateid_op

Move the state locking and file descriptor reference out from the
callers and into nfs4_preprocess_stateid_op() itself.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSd: Mark nfs4_free_lockowner and nfs4_free_openowner as static functions
Trond Myklebust [Fri, 18 Apr 2014 18:44:03 +0000 (14:44 -0400)]
NFSd: Mark nfs4_free_lockowner and nfs4_free_openowner as static functions

They do not need to be used outside fs/nfsd/nfs4state.c

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: remove <linux/nfsd/debug.h>
Christoph Hellwig [Tue, 6 May 2014 17:37:16 +0000 (19:37 +0200)]
nfsd: remove <linux/nfsd/debug.h>

There is almost nothing left it in, just merge it into the only file
that includes it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: move <linux/nfsd/stats.h> to fs/nfsd
Christoph Hellwig [Tue, 6 May 2014 17:37:15 +0000 (19:37 +0200)]
nfsd: move <linux/nfsd/stats.h> to fs/nfsd

There are no legitimate users outside of fs/nfsd, so move it there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: move <linux/nfsd/export.h> to fs/nfsd
Christoph Hellwig [Tue, 6 May 2014 17:37:14 +0000 (19:37 +0200)]
nfsd: move <linux/nfsd/export.h> to fs/nfsd

There are no legitimate users outside of fs/nfsd, so move it there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: remove <linux/nfsd/nfsfh.h>
Christoph Hellwig [Tue, 6 May 2014 17:37:13 +0000 (19:37 +0200)]
nfsd: remove <linux/nfsd/nfsfh.h>

The only real user of this header is fs/nfsd/nfsfh.h, so merge the
two.  Various lockѕ source files used it to indirectly get other
sunrpc or nfs headers, so fix those up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSd: Remove 'inline' designation for free_client()
Trond Myklebust [Fri, 18 Apr 2014 18:43:58 +0000 (14:43 -0400)]
NFSd: Remove 'inline' designation for free_client()

It is large, it is used in more than one place, and it is not performance
critical. Let gcc figure out whether it should be inlined...

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agolockd: avoid warning when CONFIG_SYSCTL undefined
Kees Cook [Thu, 1 May 2014 21:15:02 +0000 (14:15 -0700)]
lockd: avoid warning when CONFIG_SYSCTL undefined

When building without CONFIG_SYSCTL, the compiler saw an unused
label. This moves the label into the #ifdef it is used under.

fs/lockd/svc.c: In function ‘init_nlm’:
fs/lockd/svc.c:626:1: warning: label ‘err_sysctl’ defined but not used [-Wunused-label]

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSd: call rpc_destroy_wait_queue() from free_client()
Trond Myklebust [Fri, 18 Apr 2014 18:43:57 +0000 (14:43 -0400)]
NFSd: call rpc_destroy_wait_queue() from free_client()

Mainly to ensure that we don't leave any hanging timers.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoNFSd: Move default initialisers from create_client() to alloc_client()
Trond Myklebust [Fri, 18 Apr 2014 18:43:56 +0000 (14:43 -0400)]
NFSd: Move default initialisers from create_client() to alloc_client()

Aside from making it clearer what is non-trivial in create_client(), it
also fixes a bug whereby we can call free_client() before idr_init()
has been called.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoRevert "nfsd4: fix nfs4err_resource in 4.1 case"
J. Bruce Fields [Wed, 9 Apr 2014 15:07:01 +0000 (11:07 -0400)]
Revert "nfsd4: fix nfs4err_resource in 4.1 case"

Since we're still limiting attributes to a page, the result here is that
a large getattr result will return NFS4ERR_REP_TOO_BIG/TOO_BIG_TO_CACHE
instead of NFS4ERR_RESOURCE.

Both error returns are wrong, and the real bug here is the arbitrary
limit on getattr results, fixed by as-yet out-of-tree patches.  But at a
minimum we can make life easier for clients by sticking to one broken
behavior in released kernels instead of two....

Trond says:

one immediate consequence of this patch will be that NFSv4.1
clients will now report EIO instead of EREMOTEIO if they hit the
problem. That may make debugging a little less obvious.

Another consequence will be that if we ever do try to add client
side handling of NFS4ERR_REP_TOO_BIG, then we now have to deal
with the “handle existing buggy server” syndrome.

Reported-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agonfsd: set timeparms.to_maxval in setup_callback_client
Jeff Layton [Tue, 15 Apr 2014 12:51:48 +0000 (08:51 -0400)]
nfsd: set timeparms.to_maxval in setup_callback_client

...otherwise the logic in the timeout handling doesn't work correctly.

Spotted-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agolocks: allow __break_lease to sleep even when break_time is 0
Jeff Layton [Tue, 15 Apr 2014 12:44:12 +0000 (08:44 -0400)]
locks: allow __break_lease to sleep even when break_time is 0

A fl->fl_break_time of 0 has a special meaning to the lease break code
that basically means "never break the lease". knfsd uses this to ensure
that leases don't disappear out from under it.

Unfortunately, the code in __break_lease can end up passing this value
to wait_event_interruptible as a timeout, which prevents it from going
to sleep at all. This causes __break_lease to spin in a tight loop and
causes soft lockups.

Fix this by ensuring that we pass a minimum value of 1 as a timeout
instead.

Cc: <stable@vger.kernel.org>
Cc: J. Bruce Fields <bfields@fieldses.org>
Reported-by: Terry Barnaby <terry1@beam.ltd.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
10 years agoLinux 3.15-rc1
Linus Torvalds [Sun, 13 Apr 2014 21:18:35 +0000 (14:18 -0700)]
Linux 3.15-rc1

10 years agomm: Initialize error in shmem_file_aio_read()
Geert Uytterhoeven [Sun, 13 Apr 2014 18:46:22 +0000 (20:46 +0200)]
mm: Initialize error in shmem_file_aio_read()

Some versions of gcc even warn about it:

  mm/shmem.c: In function ‘shmem_file_aio_read’:
  mm/shmem.c:1414: warning: ‘error’ may be used uninitialized in this function

If the loop is aborted during the first iteration by one of the two
first break statements, error will be uninitialized.

Introduced by commit 6e58e79db8a1 ("introduce copy_page_to_iter, kill
loop over iovec in generic_file_aio_read()").

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agocifs: Use min_t() when comparing "size_t" and "unsigned long"
Geert Uytterhoeven [Sun, 13 Apr 2014 18:46:21 +0000 (20:46 +0200)]
cifs: Use min_t() when comparing "size_t" and "unsigned long"

On 32 bit, size_t is "unsigned int", not "unsigned long", causing the
following warning when comparing with PAGE_SIZE, which is always "unsigned
long":

  fs/cifs/file.c: In function ‘cifs_readdata_to_iov’:
  fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast

Introduced by commit 7f25bba819a3 ("cifs_iovec_read: keep iov_iter
between the calls of cifs_readdata_to_iov()"), which changed the
signedness of "remaining" and the code from min_t() to min().

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoMerge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg...
Linus Torvalds [Sun, 13 Apr 2014 20:28:13 +0000 (13:28 -0700)]
Merge branch 'slab/next' of git://git./linux/kernel/git/penberg/linux

Pull slab changes from Pekka Enberg:
 "The biggest change is byte-sized freelist indices which reduces slab
  freelist memory usage:

    https://lkml.org/lkml/2013/12/2/64"

* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
  mm: slab/slub: use page->list consistently instead of page->lru
  mm/slab.c: cleanup outdated comments and unify variables naming
  slab: fix wrongly used macro
  slub: fix high order page allocation problem with __GFP_NOFAIL
  slab: Make allocations with GFP_ZERO slightly more efficient
  slab: make more slab management structure off the slab
  slab: introduce byte sized index for the freelist of a slab
  slab: restrict the number of objects in a slab
  slab: introduce helper functions to get/set free object
  slab: factor out calculate nr objects in cache_estimate

10 years agoMerge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Linus Torvalds [Sun, 13 Apr 2014 01:22:27 +0000 (18:22 -0700)]
Merge branch 'misc' of git://git./linux/kernel/git/mmarek/kbuild

Pull misc kbuild changes from Michal Marek:
 "Here is the non-critical part of kbuild:
   - One bogus coccinelle check removed, one check fixed not to suggest
     the obsolete PTR_RET macro
   - scripts/tags.sh does not index the generated *.mod.c files
   - new objdiff tool to list differences between two versions of an
     object file
   - A fix for scripts/bootgraph.pl"

* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  scripts/coccinelle: Use PTR_ERR_OR_ZERO
  scripts/bootgraph.pl: Add graphic header
  scripts: objdiff: detect object code changes between two commits
  Coccicheck: Remove memcpy to struct assignment test
  scripts/tags.sh: Ignore *.mod.c

10 years agosym53c8xx_2: Set DID_REQUEUE return code when aborting squeue
Mikulas Patocka [Wed, 9 Apr 2014 01:52:05 +0000 (21:52 -0400)]
sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue

This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
returns QUEUE FULL status.

When the controller encounters an error (including QUEUE FULL or BUSY
status), it aborts all not yet submitted requests in the function
sym_dequeue_from_squeue.

This function aborts them with DID_SOFT_ERROR.

If the disk has full tag queue, the request that caused the overflow is
aborted with QUEUE FULL status (and the scsi midlayer properly retries
it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
the following requests with DID_SOFT_ERROR --- for them, the midlayer
does just a few retries and then signals the error up to sd.

The result is that disk returning QUEUE FULL causes request failures.

The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
(rebranded ST336607LC) with command queue 48 or 64 tags.  The disk has
64 tags, but under some access patterns it return QUEUE FULL when there
are less than 64 pending tags.  The SCSI specification allows returning
QUEUE FULL anytime and it is up to the host to retry.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agopowerpc: Don't try to set LPCR unless we're in hypervisor mode
Paul Mackerras [Fri, 11 Apr 2014 06:43:35 +0000 (16:43 +1000)]
powerpc: Don't try to set LPCR unless we're in hypervisor mode

Commit 8f619b5429d9 ("powerpc/ppc64: Do not turn AIL (reloc-on
interrupts) too early") added code to set the AIL bit in the LPCR
without checking whether the kernel is running in hypervisor mode.  The
result is that when the kernel is running as a guest (i.e., under
PowerKVM or PowerVM), the processor takes a privileged instruction
interrupt at that point, causing a panic.  The visible result is that
the kernel hangs after printing "returning from prom_init".

This fixes it by checking for hypervisor mode being available before
setting LPCR.  If we are not in hypervisor mode, we enable relocation-on
interrupts later in pSeries_setup_arch using the H_SET_MODE hcall.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agofutex: update documentation for ordering guarantees
Davidlohr Bueso [Wed, 9 Apr 2014 18:55:07 +0000 (11:55 -0700)]
futex: update documentation for ordering guarantees

Commits 11d4616bd07f ("futex: revert back to the explicit waiter
counting code") and 69cd9eba3886 ("futex: avoid race between requeue and
wake") changed some of the finer details of how we think about futexes.
One was a late fix and the other a consequence of overlooking the whole
requeuing logic.

The first change caused our documentation to be incorrect, and the
second made us aware that we need to explicitly add more details to it.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sun, 13 Apr 2014 00:31:22 +0000 (17:31 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull yet more networking updates from David Miller:

 1) Various fixes to the new Redpine Signals wireless driver, from
    Fariya Fatima.

 2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
    Dmitry Petukhov.

 3) UFO and TSO packets differ in whether they include the protocol
    header in gso_size, account for that in skb_gso_transport_seglen().
   From Florian Westphal.

 4) If VLAN untagging fails, we double free the SKB in the bridging
    output path.  From Toshiaki Makita.

 5) Several call sites of sk->sk_data_ready() were referencing an SKB
    just added to the socket receive queue in order to calculate the
    second argument via skb->len.  This is dangerous because the moment
    the skb is added to the receive queue it can be consumed in another
    context and freed up.

    It turns out also that none of the sk->sk_data_ready()
    implementations even care about this second argument.

    So just kill it off and thus fix all these use-after-free bugs as a
    side effect.

 6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.

 7) pktgen needs to do locking properly for LLTX devices, from Daniel
    Borkmann.

 8) xen-netfront driver initializes TX array entries in RX loop :-) From
    Vincenzo Maffione.

 9) After refactoring, some tunnel drivers allow a tunnel to be
    configured on top itself.  Fix from Nicolas Dichtel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
  vti: don't allow to add the same tunnel twice
  gre: don't allow to add the same tunnel twice
  drivers: net: xen-netfront: fix array initialization bug
  pktgen: be friendly to LLTX devices
  r8152: check RTL8152_UNPLUG
  net: sun4i-emac: add promiscuous support
  net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  net: ipv6: Fix oif in TCP SYN+ACK route lookup.
  drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
  drivers: net: cpsw: discard all packets received when interface is down
  net: Fix use after free by removing length arg from sk_data_ready callbacks.
  Drivers: net: hyperv: Address UDP checksum issues
  Drivers: net: hyperv: Negotiate suitable ndis version for offload support
  Drivers: net: hyperv: Allocate memory for all possible per-pecket information
  bridge: Fix double free and memory leak around br_allowed_ingress
  bonding: Remove debug_fs files when module init fails
  i40evf: program RSS LUT correctly
  i40evf: remove open-coded skb_cow_head
  ixgb: remove open-coded skb_cow_head
  igbvf: remove open-coded skb_cow_head
  ...

10 years agoMerge tag 'blackfin-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/realm...
Linus Torvalds [Sun, 13 Apr 2014 00:26:45 +0000 (17:26 -0700)]
Merge tag 'blackfin-for-linus' of git://git./linux/kernel/git/realmz6/blackfin-linux

Pull blackfin updates from Steven Miao:
 "Code cleanup, some previously ignored patches, and bug fixes"

* tag 'blackfin-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux:
  blackfin: cleanup board files
  bf609: clock: drop unused clock bit set/clear functions
  Blackfin: bf537: rename "CONFIG_ADT75"
  Blackfin: bf537: rename "CONFIG_AD7314"
  Blackfin: bf537: rename ad2s120x ->ad2s1200
  blackfin: bf537: fix typo "CONFIG_SND_SOC_ADV80X_MODULE"
  blackfin: dma: current count mmr is read only
  bfin_crc: Move architecture independant crc header file out of the blackfin folder.
  bf54x: drop unuesd HOST status,control,timeout registers bit define macros
  blackfin: portmux: cleanup head file
  Blackfin: remove "config IP_CHECKSUM_L1"
  blackfin: Remove GENERIC_GPIO config option again
  blackfin:Use generic /proc/interrupts implementation
  blackfin: bf60x: fix typo "CONFIG_PM_BFIN_WAKE_PA15_POL"

10 years agoMerge tag 'remoteproc-3.15-cleanups' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 13 Apr 2014 00:23:12 +0000 (17:23 -0700)]
Merge tag 'remoteproc-3.15-cleanups' of git://git./linux/kernel/git/ohad/remoteproc

Pull remoteproc cleanups from Ohad Ben-Cohen:
 "Several remoteproc cleanup patches coming from Jingoo Han, Julia
  Lawall and Uwe Kleine-König"

* tag 'remoteproc-3.15-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
  remoteproc/ste_modem: staticize local symbols
  remoteproc/davinci: simplify use of devm_ioremap_resource
  remoteproc/davinci: drop needless devm_clk_put

10 years agoMerge tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel
Linus Torvalds [Sun, 13 Apr 2014 00:00:40 +0000 (17:00 -0700)]
Merge tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel

Pull llvm patches from Behan Webster:
 "These are some initial updates to support compiling the kernel with
  clang.

  These patches have been through the proper reviews to the best of my
  ability, and have been soaking in linux-next for a few weeks.  These
  patches by themselves still do not completely allow clang to be used
  with the kernel code, but lay the foundation for other patches which
  are still under review.

  Several other of the LLVMLinux patches have been already added via
  maintainer trees"

* tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel:
  x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id"
  x86 kbuild: LLVMLinux: More cc-options added for clang
  x86, acpi: LLVMLinux: Remove nested functions from Thinkpad ACPI
  LLVMLinux: Add support for clang to compiler.h and new compiler-clang.h
  LLVMLinux: Remove warning about returning an uninitialized variable
  kbuild: LLVMLinux: Fix LINUX_COMPILER definition script for compilation with clang
  Documentation: LLVMLinux: Update Documentation/dontdiff
  kbuild: LLVMLinux: Adapt warnings for compilation with clang
  kbuild: LLVMLinux: Add Kbuild support for building kernel with Clang

10 years agoMerge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target...
Linus Torvalds [Sat, 12 Apr 2014 23:51:08 +0000 (16:51 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/nab/target-pending

Pull SCSI target updates from Nicholas Bellinger:
 "Here are the target pending updates for v3.15-rc1.  Apologies in
  advance for waiting until the second to last day of the merge window
  to send these out.

  The highlights this round include:

   - iser-target support for T10 PI (DIF) offloads (Sagi + Or)
   - Fix Task Aborted Status (TAS) handling in target-core (Alex Leung)
   - Pass in transport supported PI at session initialization (Sagi + MKP + nab)
   - Add WRITE_INSERT + READ_STRIP T10 PI support in target-core (nab + Sagi)
   - Fix iscsi-target ERL=2 ASYNC_EVENT connection pointer bug (nab)
   - Fix tcm_fc use-after-free of ft_tpg (Andy Grover)
   - Use correct ib_sg_dma primitives in ib_isert (Mike Marciniszyn)

  Also, note the virtio-scsi + vhost-scsi changes to expose T10 PI
  metadata into KVM guest have been left-out for now, as there where a
  few comments from MST + Paolo that where not able to be addressed in
  time for v3.15.  Please expect this feature for v3.16-rc1"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (43 commits)
  ib_srpt: Use correct ib_sg_dma primitives
  target/tcm_fc: Rename ft_tport_create to ft_tport_get
  target/tcm_fc: Rename ft_{add,del}_lport to {add,del}_wwn
  target/tcm_fc: Rename structs and list members for clarity
  target/tcm_fc: Limit to 1 TPG per wwn
  target/tcm_fc: Don't export ft_lport_list
  target/tcm_fc: Fix use-after-free of ft_tpg
  target: Add check to prevent Abort Task from aborting itself
  target: Enable READ_STRIP emulation in target_complete_ok_work
  target/sbc: Add sbc_dif_read_strip software emulation
  target: Enable WRITE_INSERT emulation in target_execute_cmd
  target/sbc: Add sbc_dif_generate software emulation
  target/sbc: Only expose PI read_cap16 bits when supported by fabric
  target/spc: Only expose PI mode page bits when supported by fabric
  target/spc: Only expose PI inquiry bits when supported by fabric
  target: Pass in transport supported PI at session initialization
  target/iblock: Fix double bioset_integrity_free bug
  Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist
  target/rd: T10-Dif: RAM disk is allocating more space than required.
  iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug
  ...

10 years agoMerge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Sat, 12 Apr 2014 23:18:17 +0000 (16:18 -0700)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
 "A series of bug fix patches for v3.15-rc1.  Most are just driver
  fixes.  There are some changes at remote controller core level, fixing
  some definitions on a new API added for Kernel v3.15.

  It also adds the missing include at include/uapi/linux/v4l2-common.h,
  to allow its compilation on userspace, as pointed by you"

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (24 commits)
  [media] gpsca: remove the risk of a division by zero
  [media] stk1160: warrant a NUL terminated string
  [media] v4l: ti-vpe: retain v4l2_buffer flags for captured buffers
  [media] v4l: ti-vpe: Set correct field parameter for output and capture buffers
  [media] v4l: ti-vpe: zero out reserved fields in try_fmt
  [media] v4l: ti-vpe: Fix initial configuration queue data
  [media] v4l: ti-vpe: Use correct bus_info name for the device in querycap
  [media] v4l: ti-vpe: report correct capabilities in querycap
  [media] v4l: ti-vpe: Allow usage of smaller images
  [media] v4l: ti-vpe: Use video_device_release_empty
  [media] v4l: ti-vpe: Make sure in job_ready that we have the needed number of dst_bufs
  [media] lgdt3305: include sleep functionality in lgdt3304_ops
  [media] drx-j: use customise option correctly
  [media] m88rs2000: fix sparse static warnings
  [media] r820t: fix size and init values
  [media] rc-core: remove generic scancode filter
  [media] rc-core: split dev->s_filter
  [media] rc-core: do not change 32bit NEC scancode format for now
  [media] rtl28xxu: remove duplicate ID 0458:707f Genius TVGo DVB-T03
  [media] xc2028: add missing break to switch
  ...