Shirley Ma [Wed, 28 May 2014 14:34:24 +0000 (10:34 -0400)]
xprtrdma: Allocate missing pagelist
GETACL relies on transport layer to alloc memory for reply buffer.
However xprtrdma assumes that the reply buffer (pagelist) has been
pre-allocated in upper layer. This problem was reported by IOL OFA lab
test on PPC.
Signed-off-by: Shirley Ma <shirley.ma@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Edward Mossman <emossman@iol.unh.edu>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:34:16 +0000 (10:34 -0400)]
xprtrdma: Remove Tavor MTU setting
Clean up. Remove HCA-specific clutter in xprtrdma, which is
supposed to be device-independent.
Hal Rosenstock <hal@dev.mellanox.co.il> observes:
> Note that there is OpenSM option (enable_quirks) to return 1K MTU
> in SA PathRecord responses for Tavor so that can be used for this.
> The default setting for enable_quirks is FALSE so that would need
> changing.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:34:07 +0000 (10:34 -0400)]
xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
Devesh Sharma <Devesh.Sharma@Emulex.Com> reports that after a
disconnect, his HCA is failing to create a fresh QP, leaving
ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to
wake up and post LOCAL_INV as they exit, causing an oops.
rpcrdma_ep_connect() is allowing the wake-up by leaking the QP
creation error code (-EPERM in this case) to the RPC client's
generic layer. xprt_connect_status() does not recognize -EPERM, so
it kills pending RPC tasks immediately rather than retrying the
connect.
Re-arrange the QP creation logic so that when it fails on reconnect,
it leaves ->qp with the old QP rather than NULL. If pending RPC
tasks wake and exit, LOCAL_INV work requests will flush rather than
oops.
On initial connect, leaving ->qp == NULL is OK, since there are no
pending RPCs that might use ->qp. But be sure not to try to destroy
a NULL QP when rpcrdma_ep_connect() is retried.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:33:59 +0000 (10:33 -0400)]
xprtrdma: Reduce the number of hardway buffer allocations
While marshaling an RPC/RDMA request, the inline_{rsize,wsize}
settings determine whether an inline request is used, or whether
read or write chunks lists are built. The current default value of
these settings is 1024. Any RPC request smaller than 1024 bytes is
sent to the NFS server completely inline.
rpcrdma_buffer_create() allocates and pre-registers a set of RPC
buffers for each transport instance, also based on the inline rsize
and wsize settings.
RPC/RDMA requests and replies are built in these buffers. However,
if an RPC/RDMA request is expected to be larger than 1024, a buffer
has to be allocated and registered for that RPC, and deregistered
and released when the RPC is complete. This is known has a
"hardway allocation."
Since the introduction of NFSv4, the size of RPC requests has become
larger, and hardway allocations are thus more frequent. Hardway
allocations are significant overhead, and they waste the existing
RPC buffers pre-allocated by rpcrdma_buffer_create().
We'd like fewer hardway allocations.
Increasing the size of the pre-registered buffers is the most direct
way to do this. However, a blanket increase of the inline thresholds
has interoperability consequences.
On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000
bytes for each RPC request buffer, using kmalloc(). Due to internal
fragmentation, this wastes nearly 1200 bytes because kmalloc()
already returns an 8192-byte piece of memory for a 7000-byte
allocation request, though the extra space remains unused.
So let's round up the size of the pre-allocated buffers, and make
use of the unused space in the kmalloc'd memory.
This change reduces the amount of hardway allocated memory for an
NFSv4 general connectathon run from
1322092 to 9472 bytes (99%).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:33:51 +0000 (10:33 -0400)]
xprtrdma: Limit work done by completion handler
Sagi Grimberg <sagig@dev.mellanox.co.il> points out that a steady
stream of CQ events could starve other work because of the boundless
loop pooling in rpcrdma_{send,recv}_poll().
Instead of a (potentially infinite) while loop, return after
collecting a budgeted number of completions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:33:42 +0000 (10:33 -0400)]
xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
Change the completion handlers to grab up to 16 items per
ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16
items are returned.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:33:34 +0000 (10:33 -0400)]
xprtrmda: Reduce lock contention in completion handlers
Skip the ib_poll_cq() after re-arming, if the provider knows there
are no additional items waiting. (Have a look at commit
ed23a727 for
more details).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:33:25 +0000 (10:33 -0400)]
xprtrdma: Split the completion queue
The current CQ handler uses the ib_wc.opcode field to distinguish
between event types. However, the contents of that field are not
reliable if the completion status is not IB_WC_SUCCESS.
When an error completion occurs on a send event, the CQ handler
schedules a tasklet with something that is not a struct rpcrdma_rep.
This is never correct behavior, and sometimes it results in a panic.
To resolve this issue, split the completion queue into a send CQ and
a receive CQ. The send CQ handler now handles only struct rpcrdma_mw
wr_id's, and the receive CQ handler now handles only struct
rpcrdma_rep wr_id's.
Fix suggested by Shirley Ma <shirley.ma@oracle.com>
Reported-by: Rafael Reiter <rafael.reiter@ims.co.at>
Fixes: 5c635e09cec0feeeb310968e51dad01040244851
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Klemens Senn <klemens.senn@ims.co.at>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:33:16 +0000 (10:33 -0400)]
xprtrdma: Make rpcrdma_ep_destroy() return void
Clean up: rpcrdma_ep_destroy() returns a value that is used
only to print a debugging message. rpcrdma_ep_destroy() already
prints debugging messages in all error cases.
Make rpcrdma_ep_destroy() return void instead.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:33:08 +0000 (10:33 -0400)]
xprtrdma: Simplify rpcrdma_deregister_external() synopsis
Clean up: All remaining callers of rpcrdma_deregister_external()
pass NULL as the last argument, so remove that argument.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:33:00 +0000 (10:33 -0400)]
xprtrdma: mount reports "Invalid mount option" if memreg mode not supported
If the selected memory registration mode is not supported by the
underlying provider/HCA, the NFS mount command reports that there was
an invalid mount option, and fails. This is misleading.
Reporting a problem allocating memory is a lot closer to the truth.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:32:51 +0000 (10:32 -0400)]
xprtrdma: Fall back to MTHCAFMR when FRMR is not supported
An audit of in-kernel RDMA providers that do not support the FRMR
memory registration shows that several of them support MTHCAFMR.
Prefer MTHCAFMR when FRMR is not supported.
If MTHCAFMR is not supported, only then choose ALLPHYSICAL.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:32:43 +0000 (10:32 -0400)]
xprtrdma: Remove REGISTER memory registration mode
All kernel RDMA providers except amso1100 support either MTHCAFMR
or FRMR, both of which are faster than REGISTER. amso1100 can
continue to use ALLPHYSICAL.
The only other ULP consumer in the kernel that uses the reg_phys_mr
verb is Lustre.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:32:34 +0000 (10:32 -0400)]
xprtrdma: Remove MEMWINDOWS registration modes
The MEMWINDOWS and MEMWINDOWS_ASYNC memory registration modes were
intended as stop-gap modes before the introduction of FRMR. They
are now considered obsolete.
MEMWINDOWS_ASYNC is also considered unsafe because it can leave
client memory registered and exposed for an indeterminant time after
each I/O.
At this point, the MEMWINDOWS modes add needless complexity, so
remove them.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:32:26 +0000 (10:32 -0400)]
xprtrdma: Remove BOUNCEBUFFERS memory registration mode
Clean up: This memory registration mode is slow and was never
meant for use in production environments. Remove it to reduce
implementation complexity.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 May 2014 14:32:17 +0000 (10:32 -0400)]
xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context
An IB provider can invoke rpcrdma_conn_func() in an IRQ context,
thus rpcrdma_conn_func() cannot be allowed to directly invoke
generic RPC functions like xprt_wake_pending_tasks().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Allen Andrews [Wed, 28 May 2014 14:32:09 +0000 (10:32 -0400)]
nfs-rdma: Fix for FMR leaks
Two memory region leaks were found during testing:
1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's
ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is
called. If ib_alloc_fast_reg_page_list returns an error it bails out of
the routine dropping the last ib_alloc_fast_reg_mr frmr region creating a
memory leak. Added code to dereg the last frmr if
ib_alloc_fast_reg_page_list fails.
2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free
the MR's on the rb_mws list if there are rb_send_bufs present. However, in
rpcrdma_buffer_create while the rb_mws list is being built if one of the MR
allocation requests fail after some MR's have been allocated on the rb_mws
list the routine never gets to create any rb_send_bufs but instead jumps to
the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws
list because the rb_send_bufs were never created. This leaks all the MR's
on the rb_mws list that were created prior to one of the MR allocations
failing.
Issue(2) was seen during testing. Our adapter had a finite number of MR's
available and we created enough connections to where we saw an MR
allocation failure on our Nth NFS connection request. After the kernel
cleaned up the resources it had allocated for the Nth connection we noticed
that FMR's had been leaked due to the coding error described above.
Issue(1) was seen during a code review while debugging issue(2).
Signed-off-by: Allen Andrews <allen.andrews@emulex.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Steve Wise [Wed, 28 May 2014 14:32:00 +0000 (10:32 -0400)]
xprtrdma: mind the device's max fast register page list depth
Some rdma devices don't support a fast register page list depth of
at least RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast
register regions according to the minimum of the device max supported
depth or RPCRDMA_MAX_DATA_SEGS.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Tom Haynes [Mon, 12 May 2014 21:35:52 +0000 (14:35 -0700)]
Push the file layout driver into a subdirectory
The object and block layouts already exist in their own
subdirectories. This patch completes the set!
Note that as a layout denotes nfs4 already, I stripped
that prefix out of the file names.
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Fri, 30 May 2014 00:17:17 +0000 (20:17 -0400)]
pNFS: Handle allocation errors correctly in objlayout_alloc_layout_hdr()
Return the NULL pointer when the allocation fails.
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Fri, 30 May 2014 00:06:55 +0000 (20:06 -0400)]
pNFS: Handle allocation errors correctly in filelayout_alloc_layout_hdr()
Return the NULL pointer when the allocation fails.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: <stable@vger.kernel.org> # 3.5.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Scott Mayhew [Thu, 29 May 2014 20:41:22 +0000 (16:41 -0400)]
nfs: Apply NFS_MOUNT_CMP_FLAGMASK to nfs_compare_remount_data()
Those flags are obsolete and checking them can incorrectly cause
remount operations to fail.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Andy Adamson [Fri, 23 May 2014 13:22:59 +0000 (06:22 -0700)]
NFSv4: Use error handler on failed GETATTR with successful OPEN
Place the call to resend the failed GETATTR under the error handler so that
when appropriate, the GETATTR is retried more than once.
The server can fail the GETATTR op in the OPEN compound with a recoverable
error such as NFS4ERR_DELAY. In the case of an O_EXCL open, the server has
created the file, so a retrans of the OPEN call will fail with NFS4ERR_EXIST.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Thu, 29 May 2014 15:45:57 +0000 (11:45 -0400)]
NFS: Fix a potential busy wait in nfs_page_group_lock
We cannot allow nfs_page_group_lock to use TASK_KILLABLE here, since
the loop would cause a busy wait if somebody kills the task.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Thu, 29 May 2014 15:38:15 +0000 (11:38 -0400)]
NFS: Fix error handling in __nfs_pageio_add_request
Handle the case where nfs_create_request() returns an error.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
David Rientjes [Wed, 7 May 2014 20:03:41 +0000 (13:03 -0700)]
net, sunrpc: suppress allocation warning in rpc_malloc()
rpc_malloc() allocates with GFP_NOWAIT without making any attempt at
reclaim so it easily fails when low on memory. This ends up spamming the
kernel log:
SLAB: Unable to allocate memory on node 0 (gfp=0x4000)
cache: kmalloc-8192, object size: 8192, order: 1
node 0: slabs: 207/207, objs: 207/207, free: 0
rekonq: page allocation failure: order:1, mode:0x204000
CPU: 2 PID: 14321 Comm: rekonq Tainted: G O 3.15.0-rc3-12.gfc9498b-desktop+ #6
Hardware name: System manufacturer System Product Name/M4A785TD-V EVO, BIOS 2105 07/23/2010
0000000000000000 ffff880010ff17d0 ffffffff815e693c 0000000000204000
ffff880010ff1858 ffffffff81137bd2 0000000000000000 0000001000000000
ffff88011ffebc38 0000000000000001 0000000000204000 ffff88011ffea000
Call Trace:
[<
ffffffff815e693c>] dump_stack+0x4d/0x6f
[<
ffffffff81137bd2>] warn_alloc_failed+0xd2/0x140
[<
ffffffff8113be19>] __alloc_pages_nodemask+0x7e9/0xa30
[<
ffffffff811824a8>] kmem_getpages+0x58/0x140
[<
ffffffff81183de6>] fallback_alloc+0x1d6/0x210
[<
ffffffff81183be3>] ____cache_alloc_node+0x123/0x150
[<
ffffffff81185953>] __kmalloc+0x203/0x490
[<
ffffffffa06b0ee2>] rpc_malloc+0x32/0xa0 [sunrpc]
[<
ffffffffa06a6999>] call_allocate+0xb9/0x170 [sunrpc]
[<
ffffffffa06b19d8>] __rpc_execute+0x88/0x460 [sunrpc]
[<
ffffffffa06b2da9>] rpc_execute+0x59/0xc0 [sunrpc]
[<
ffffffffa06a932b>] rpc_run_task+0x6b/0x90 [sunrpc]
[<
ffffffffa077b5c1>] nfs4_call_sync_sequence+0x51/0x80 [nfsv4]
[<
ffffffffa077d45d>] _nfs4_do_setattr+0x1ed/0x280 [nfsv4]
[<
ffffffffa0782a72>] nfs4_do_setattr+0x72/0x180 [nfsv4]
[<
ffffffffa078334c>] nfs4_proc_setattr+0xbc/0x140 [nfsv4]
[<
ffffffffa074a7e8>] nfs_setattr+0xd8/0x240 [nfs]
[<
ffffffff811baa71>] notify_change+0x231/0x380
[<
ffffffff8119cf5c>] chmod_common+0xfc/0x120
[<
ffffffff8119df80>] SyS_chmod+0x40/0x90
[<
ffffffff815f4cfd>] system_call_fastpath+0x1a/0x1f
...
If the allocation fails, simply return NULL and avoid spamming the kernel
log.
Reported-by: Marc Dietrich <marvin24@gmx.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:57 +0000 (11:56 -0400)]
nfs: support page groups in nfs_read_completion
nfs_read_completion relied on the fact that there was a 1:1 mapping
of page to nfs_request, but this has now changed.
Regions not covered by a request have already been zeroed elsewhere.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:56 +0000 (11:56 -0400)]
pnfs: filelayout: support non page aligned layouts
Use the new pg_test interface to adjust requests to fit in the current
stripe / segment.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:55 +0000 (11:56 -0400)]
pnfs: allow non page aligned pnfs layout segments
Remove alignment checks that would revert to MDS and change pg_test
to return the max ammount left in the segment (or other pg_test call)
up to size of passed request, or 0 if no space is left.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:54 +0000 (11:56 -0400)]
pnfs: support multiple verfs per direct req
Support direct requests that span multiple pnfs data servers by
comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket.
Continue to use dreq->verf if the MDS is used / non-pNFS.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:53 +0000 (11:56 -0400)]
nfs: remove data list from pgio header
Since the ability to split pages into subpage requests has been added,
nfs_pgio_header->rpc_list only ever has one pgio data.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:52 +0000 (11:56 -0400)]
nfs: use > 1 request to handle bsize < PAGE_SIZE
Use the newly added support for multiple requests per page for
rsize/wsize < PAGE_SIZE, instead of having multiple read / write
data structures per pageio header.
This allows us to get rid of nfs_pgio_multi.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:51 +0000 (11:56 -0400)]
nfs: chain calls to pg_test
Now that pg_test can change the size of the request (by returning a non-zero
size smaller than the request), pg_test functions that call other
pg_test functions must return the minimum of the result - or 0 if any fail.
Also clean up the logic of some pg_test functions so that all checks are
for contitions where coalescing is not possible.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:50 +0000 (11:56 -0400)]
nfs: allow coalescing of subpage requests
Remove check that the request covers a whole page.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:49 +0000 (11:56 -0400)]
pnfs: clean up filelayout_alloc_commit_info
Remove unneeded else statement and clean up how commit info
dataserver buckets are replaced.
Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:48 +0000 (11:56 -0400)]
nfs: page group support in nfs_mark_uptodate
Change how nfs_mark_uptodate checks to see if writes cover a whole page.
This patch should have no effect yet since all page groups currently
have one request, but will come into play when pg_test functions are
modified to split pages into sub-page regions.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:47 +0000 (11:56 -0400)]
nfs: page group syncing in write path
Operations that modify state for a whole page must be syncronized across
all requests within a page group. In the write path, this is calling
end_page_writeback and removing the head request from an inode.
Both of these operations should not be called until all requests
in a page group have reached the point where they would call them.
This patch should have no effect yet since all page groups currently
have one request, but will come into play when pg_test functions are
modified to split pages into sub-page regions.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:46 +0000 (11:56 -0400)]
nfs: page group syncing in read path
Operations that modify state for a whole page must be syncronized across
all requests within a page group. In the read path, this is calling
unlock_page and SetPageUptodate. Both of these functions should not be
called until all requests in a page group have reached the point where
they would call them.
This patch should have no effect yet since all page groups currently
have one request, but will come into play when pg_test functions are
modified to split pages into sub-page regions.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:45 +0000 (11:56 -0400)]
nfs: add support for multiple nfs reqs per page
Add "page groups" - a circular list of nfs requests (struct nfs_page)
that all reference the same page. This gives nfs read and write paths
the ability to account for sub-page regions independently. This
somewhat follows the design of struct buffer_head's sub-page
accounting.
Only "head" requests are ever added/removed from the inode list in
the buffered write path. "head" and "sub" requests are treated the
same through the read path and the rest of the write/commit path.
Requests are given an extra reference across the life of the list.
Page groups are never rejoined after being split. If the read/write
request fails and the client falls back to another path (ie revert
to MDS in PNFS case), the already split requests are pushed through
the recoalescing code again, which may split them further and then
coalesce them into properly sized requests on the wire. Fragmentation
shouldn't be a problem with the current design, because we flush all
requests in page group when a non-contiguous request is added, so
the only time resplitting should occur is on a resend of a read or
write.
This patch lays the groundwork for sub-page splitting, but does not
actually do any splitting. For now all page groups have one request
as pg_test functions don't yet split pages. There are several related
patches that are needed support multiple requests per page group.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:44 +0000 (11:56 -0400)]
nfs: call nfs_can_coalesce_requests for every req
Call nfs_can_coalesce_requests for every request, even the first one.
This is needed for future patches to give pg_test a way to inform
add_request to reduce the size of the request.
Now @prev can be null in nfs_can_coalesce_requests and pg_test functions.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:43 +0000 (11:56 -0400)]
nfs: modify pg_test interface to return size_t
This is a step toward allowing pg_test to inform the the
coalescing code to reduce the size of requests so they may fit in
whatever scheme the pg_test callback wants to define.
For now, just return the size of the request if there is space, or 0
if there is not. This shouldn't change any behavior as it acts
the same as when the pg_test functions returned bool.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:42 +0000 (11:56 -0400)]
nfs: remove unused arg from nfs_create_request
@inode is passed but not used.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:41 +0000 (11:56 -0400)]
nfs: clean up PG_* flags
Remove unused flags PG_NEED_COMMIT and PG_NEED_RESCHED.
Add comments describing how each flag is used.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Weston Andros Adamson [Thu, 15 May 2014 15:56:40 +0000 (11:56 -0400)]
pnfs: fix race in filelayout commit path
Hold the lock while modifying commit info dataserver buckets.
The following oops can be reproduced by running iozone for a while against
a 2 DS pynfs filelayout server.
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache
CPU: 0 PID: 903 Comm: iozone Not tainted 3.15.0-rc1-branch-dros_testing+ #44
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference
task:
ffff880078164480 ti:
ffff88006e972000 task.ti:
ffff88006e972000
RIP: 0010:[<
ffffffffa01936e1>] [<
ffffffffa01936e1>] nfs_init_commit+0x22/0x
RSP: 0018:
ffff88006e973d30 EFLAGS:
00010246
RAX:
ffff88006e973e00 RBX:
ffff88006e828800 RCX:
ffff88006e973e10
RDX:
0000000000000000 RSI:
ffff88006e973e00 RDI:
dead4ead00000000
RBP:
ffff88006e973d38 R08:
ffff88006e8289d8 R09:
0000000000000000
R10:
ffff88006e8289d8 R11:
0000000000016988 R12:
ffff88006e973b98
R13:
ffff88007a0a6648 R14:
ffff88006e973e10 R15:
ffff88006e828800
FS:
00007f2ce396b740(0000) GS:
ffff88007f200000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f03278a1000 CR3:
0000000079043000 CR4:
00000000001407f0
Stack:
ffff88006e8289d8 ffff88006e973da8 ffffffffa00f144f ffff88006e9478c0
ffff88006e973e00 ffff88006de21080 0000000100000002 ffff880079be6c48
ffff88006e973d70 ffff88006e973d70 ffff88006e973e10 ffff88006de21080
Call Trace:
[<
ffffffffa00f144f>] filelayout_commit_pagelist+0x1ae/0x34a [nfs_layout_nfsv
[<
ffffffffa0194f72>] nfs_generic_commit_list+0x92/0xc4 [nfs]
[<
ffffffffa0195053>] nfs_commit_inode+0xaf/0x114 [nfs]
[<
ffffffffa01892bd>] nfs_file_fsync_commit+0x82/0xbe [nfs]
[<
ffffffffa01ceb0d>] nfs4_file_fsync+0x59/0x9b [nfsv4]
[<
ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20
[<
ffffffff8114ee60>] vfs_fsync+0x1c/0x1e
[<
ffffffffa01891c2>] nfs_file_flush+0x7f/0x84 [nfs]
[<
ffffffff81127a43>] filp_close+0x3c/0x72
[<
ffffffff81140e12>] __close_fd+0x82/0x9a
[<
ffffffff81127a9c>] SyS_close+0x23/0x4c
[<
ffffffff814acd12>] system_call_fastpath+0x16/0x1b
Code: 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8
RIP [<
ffffffffa01936e1>] nfs_init_commit+0x22/0xe1 [nfs]
RSP <
ffff88006e973d30>
---[ end trace
732fe6419b235e2f ]---
Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:40 +0000 (09:12 -0400)]
NFS: Create a common nfs_pageio_ops struct
At this point the read and write structures look identical, so combine
them into something shared by both.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:39 +0000 (09:12 -0400)]
NFS: Create a common generic_pg_pgios()
What we have here is two functions that look identical. Let's share
some more code!
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:38 +0000 (09:12 -0400)]
NFS: Create a common multiple_pgios() function
Once again, these two functions look identical in the read and write
case. Time to combine them together!
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:37 +0000 (09:12 -0400)]
NFS: Create a common initiate_pgio() function
Most of this code is the same for both the read and write paths, so
combine everything and use the rw_ops when necessary.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:36 +0000 (09:12 -0400)]
NFS: Create a generic_pgio function
These functions are almost identical on both the read and write side.
FLUSH_COND_STABLE will never be set for the read path, so leaving it in
the generic code won't hurt anything.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:35 +0000 (09:12 -0400)]
NFS: Create a common pgio_error function
At this point, the read and write versions of this function look
identical so both should use the same function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:34 +0000 (09:12 -0400)]
NFS: Create a common rpcsetup function for reads and writes
Write adds a little bit of code dealing with flush flags, but since
"how" will always be 0 when reading we can share the code.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:33 +0000 (09:12 -0400)]
NFS: Create a common rpc_call_ops struct
The read and write paths set up this struct in exactly the same way, so
create a single shared struct.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:32 +0000 (09:12 -0400)]
NFS: Create a common nfs_pgio_result_common function
Combining these functions will let me make a single nfs_rw_common_ops
struct (see the next patch).
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:31 +0000 (09:12 -0400)]
NFS: Create a common pgio_rpc_prepare function
The read and write paths do exactly the same thing for the rpc_prepare
rpc_op. This patch combines them together into a single function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:30 +0000 (09:12 -0400)]
NFS: Create a common rw_header_alloc and rw_header_free function
I create a new struct nfs_rw_ops to decide the differences between reads
and writes. This struct will be set when initializing a new
nfs_pgio_descriptor, and then passed on to the nfs_rw_header when a new
header is allocated.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:29 +0000 (09:12 -0400)]
NFS: Create a common pgio_alloc and pgio_release function
These functions are identical for the read and write paths so they can
be combined.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:28 +0000 (09:12 -0400)]
NFS: Move the write verifier into the nfs_pgio_header
The header had a pointer to the verifier that was set from the old write
data struct. We don't need to keep the pointer around now that we have
shared structures.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:27 +0000 (09:12 -0400)]
NFS: Create a common read and write header struct
The only difference is the write verifier field, but we can keep that
for a little bit longer.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:26 +0000 (09:12 -0400)]
NFS: Create a common read and write data struct
At this point, the only difference between nfs_read_data and
nfs_write_data is the write verifier.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:25 +0000 (09:12 -0400)]
NFS: Create a common results structure for reads and writes
Reads and writes have very similar results. This patch combines the two
structs together with comments to show where the differing fields are
used.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Tue, 6 May 2014 13:12:24 +0000 (09:12 -0400)]
NFS: Create a common argument structure for reads and writes
Reads and writes have very similar arguments. This patch combines them
together and documents the few fields used only by write.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Wed, 16 Apr 2014 13:07:22 +0000 (15:07 +0200)]
nfs: remove ->read_pageio_init from rpc ops
The read_pageio_init method is just a very convoluted way to grab the
right nfs_pageio_ops vector. The vector to chose is not a choice of
protocol version, but just a pNFS vs MDS I/O choice that can simply be
done inside nfs_pageio_init_read based on the presence of a layout
driver, and a new force_mds flag to the special case of falling back
to MDS I/O on a pNFS-capable volume.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Wed, 16 Apr 2014 13:07:21 +0000 (15:07 +0200)]
nfs: remove ->write_pageio_init from rpc ops
The write_pageio_init method is just a very convoluted way to grab the
right nfs_pageio_ops vector. The vector to chose is not a choice of
protocol version, but just a pNFS vs MDS I/O choice that can simply be
done inside nfs_pageio_init_write based on the presence of a layout
driver, and a new force_mds flag to the special case of falling back
to MDS I/O on a pNFS-capable volume.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Mon, 21 Apr 2014 17:29:17 +0000 (10:29 -0700)]
nfs: commit layouts in fdatasync
"fdatasync() is similar to fsync(), but does not flush modified metadata
unless that metadata is needed in order to allow a subsequent data
retrieval to be correctly handled."
We absolutely need to commit the layouts to be able to retrieve the data
in case either the client, the server or the storage subsystem go down.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Sun, 18 May 2014 17:47:14 +0000 (13:47 -0400)]
SUNRPC: Fix a module reference issue in rpcsec_gss
We're not taking a reference in the case where _gss_mech_get_by_pseudoflavor
loops without finding the correct rpcsec_gss flavour, so why are we
releasing it?
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Sun, 13 Apr 2014 15:11:31 +0000 (11:11 -0400)]
NFS: Don't ignore suid/sgid bit changes after a successful write
If we suspect that the server may have cleared the suid/sgid bit,
then mark the inode for revalidation.
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Tue, 15 Apr 2014 14:07:57 +0000 (10:07 -0400)]
NFS: Don't declare inode uptodate unless all attributes were checked
Fix a bug, whereby nfs_update_inode() was declaring the inode to be
up to date despite not having checked all the attributes.
The bug occurs because the temporary variable in which we cache
the validity information is 'sanitised' before reapplying to
nfsi->cache_validity.
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org # 3.5+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Kinglong Mee [Tue, 15 Apr 2014 09:22:59 +0000 (17:22 +0800)]
NFS: Fix memroy leak for double mounts
When double mounting same nfs filesystem, the devname saved in d_fsdata
will be lost.The second mount should not change the devname that
be saved in d_fsdata.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Linus Torvalds [Sun, 13 Apr 2014 21:18:35 +0000 (14:18 -0700)]
Linux 3.15-rc1
Geert Uytterhoeven [Sun, 13 Apr 2014 18:46:22 +0000 (20:46 +0200)]
mm: Initialize error in shmem_file_aio_read()
Some versions of gcc even warn about it:
mm/shmem.c: In function ‘shmem_file_aio_read’:
mm/shmem.c:1414: warning: ‘error’ may be used uninitialized in this function
If the loop is aborted during the first iteration by one of the two
first break statements, error will be uninitialized.
Introduced by commit
6e58e79db8a1 ("introduce copy_page_to_iter, kill
loop over iovec in generic_file_aio_read()").
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Sun, 13 Apr 2014 18:46:21 +0000 (20:46 +0200)]
cifs: Use min_t() when comparing "size_t" and "unsigned long"
On 32 bit, size_t is "unsigned int", not "unsigned long", causing the
following warning when comparing with PAGE_SIZE, which is always "unsigned
long":
fs/cifs/file.c: In function ‘cifs_readdata_to_iov’:
fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast
Introduced by commit
7f25bba819a3 ("cifs_iovec_read: keep iov_iter
between the calls of cifs_readdata_to_iov()"), which changed the
signedness of "remaining" and the code from min_t() to min().
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 13 Apr 2014 20:28:13 +0000 (13:28 -0700)]
Merge branch 'slab/next' of git://git./linux/kernel/git/penberg/linux
Pull slab changes from Pekka Enberg:
"The biggest change is byte-sized freelist indices which reduces slab
freelist memory usage:
https://lkml.org/lkml/2013/12/2/64"
* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
mm: slab/slub: use page->list consistently instead of page->lru
mm/slab.c: cleanup outdated comments and unify variables naming
slab: fix wrongly used macro
slub: fix high order page allocation problem with __GFP_NOFAIL
slab: Make allocations with GFP_ZERO slightly more efficient
slab: make more slab management structure off the slab
slab: introduce byte sized index for the freelist of a slab
slab: restrict the number of objects in a slab
slab: introduce helper functions to get/set free object
slab: factor out calculate nr objects in cache_estimate
Linus Torvalds [Sun, 13 Apr 2014 01:22:27 +0000 (18:22 -0700)]
Merge branch 'misc' of git://git./linux/kernel/git/mmarek/kbuild
Pull misc kbuild changes from Michal Marek:
"Here is the non-critical part of kbuild:
- One bogus coccinelle check removed, one check fixed not to suggest
the obsolete PTR_RET macro
- scripts/tags.sh does not index the generated *.mod.c files
- new objdiff tool to list differences between two versions of an
object file
- A fix for scripts/bootgraph.pl"
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
scripts/coccinelle: Use PTR_ERR_OR_ZERO
scripts/bootgraph.pl: Add graphic header
scripts: objdiff: detect object code changes between two commits
Coccicheck: Remove memcpy to struct assignment test
scripts/tags.sh: Ignore *.mod.c
Mikulas Patocka [Wed, 9 Apr 2014 01:52:05 +0000 (21:52 -0400)]
sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue
This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
returns QUEUE FULL status.
When the controller encounters an error (including QUEUE FULL or BUSY
status), it aborts all not yet submitted requests in the function
sym_dequeue_from_squeue.
This function aborts them with DID_SOFT_ERROR.
If the disk has full tag queue, the request that caused the overflow is
aborted with QUEUE FULL status (and the scsi midlayer properly retries
it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
the following requests with DID_SOFT_ERROR --- for them, the midlayer
does just a few retries and then signals the error up to sd.
The result is that disk returning QUEUE FULL causes request failures.
The error was reproduced on 53c895 with COMPAQ
BD03685A24 disk
(rebranded ST336607LC) with command queue 48 or 64 tags. The disk has
64 tags, but under some access patterns it return QUEUE FULL when there
are less than 64 pending tags. The SCSI specification allows returning
QUEUE FULL anytime and it is up to the host to retry.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Mackerras [Fri, 11 Apr 2014 06:43:35 +0000 (16:43 +1000)]
powerpc: Don't try to set LPCR unless we're in hypervisor mode
Commit
8f619b5429d9 ("powerpc/ppc64: Do not turn AIL (reloc-on
interrupts) too early") added code to set the AIL bit in the LPCR
without checking whether the kernel is running in hypervisor mode. The
result is that when the kernel is running as a guest (i.e., under
PowerKVM or PowerVM), the processor takes a privileged instruction
interrupt at that point, causing a panic. The visible result is that
the kernel hangs after printing "returning from prom_init".
This fixes it by checking for hypervisor mode being available before
setting LPCR. If we are not in hypervisor mode, we enable relocation-on
interrupts later in pSeries_setup_arch using the H_SET_MODE hcall.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Wed, 9 Apr 2014 18:55:07 +0000 (11:55 -0700)]
futex: update documentation for ordering guarantees
Commits
11d4616bd07f ("futex: revert back to the explicit waiter
counting code") and
69cd9eba3886 ("futex: avoid race between requeue and
wake") changed some of the finer details of how we think about futexes.
One was a late fix and the other a consequence of overlooking the whole
requeuing logic.
The first change caused our documentation to be incorrect, and the
second made us aware that we need to explicitly add more details to it.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 13 Apr 2014 00:31:22 +0000 (17:31 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull yet more networking updates from David Miller:
1) Various fixes to the new Redpine Signals wireless driver, from
Fariya Fatima.
2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
Dmitry Petukhov.
3) UFO and TSO packets differ in whether they include the protocol
header in gso_size, account for that in skb_gso_transport_seglen().
From Florian Westphal.
4) If VLAN untagging fails, we double free the SKB in the bridging
output path. From Toshiaki Makita.
5) Several call sites of sk->sk_data_ready() were referencing an SKB
just added to the socket receive queue in order to calculate the
second argument via skb->len. This is dangerous because the moment
the skb is added to the receive queue it can be consumed in another
context and freed up.
It turns out also that none of the sk->sk_data_ready()
implementations even care about this second argument.
So just kill it off and thus fix all these use-after-free bugs as a
side effect.
6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.
7) pktgen needs to do locking properly for LLTX devices, from Daniel
Borkmann.
8) xen-netfront driver initializes TX array entries in RX loop :-) From
Vincenzo Maffione.
9) After refactoring, some tunnel drivers allow a tunnel to be
configured on top itself. Fix from Nicolas Dichtel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
vti: don't allow to add the same tunnel twice
gre: don't allow to add the same tunnel twice
drivers: net: xen-netfront: fix array initialization bug
pktgen: be friendly to LLTX devices
r8152: check RTL8152_UNPLUG
net: sun4i-emac: add promiscuous support
net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
net: ipv6: Fix oif in TCP SYN+ACK route lookup.
drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
drivers: net: cpsw: discard all packets received when interface is down
net: Fix use after free by removing length arg from sk_data_ready callbacks.
Drivers: net: hyperv: Address UDP checksum issues
Drivers: net: hyperv: Negotiate suitable ndis version for offload support
Drivers: net: hyperv: Allocate memory for all possible per-pecket information
bridge: Fix double free and memory leak around br_allowed_ingress
bonding: Remove debug_fs files when module init fails
i40evf: program RSS LUT correctly
i40evf: remove open-coded skb_cow_head
ixgb: remove open-coded skb_cow_head
igbvf: remove open-coded skb_cow_head
...
Linus Torvalds [Sun, 13 Apr 2014 00:26:45 +0000 (17:26 -0700)]
Merge tag 'blackfin-for-linus' of git://git./linux/kernel/git/realmz6/blackfin-linux
Pull blackfin updates from Steven Miao:
"Code cleanup, some previously ignored patches, and bug fixes"
* tag 'blackfin-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux:
blackfin: cleanup board files
bf609: clock: drop unused clock bit set/clear functions
Blackfin: bf537: rename "CONFIG_ADT75"
Blackfin: bf537: rename "CONFIG_AD7314"
Blackfin: bf537: rename ad2s120x ->ad2s1200
blackfin: bf537: fix typo "CONFIG_SND_SOC_ADV80X_MODULE"
blackfin: dma: current count mmr is read only
bfin_crc: Move architecture independant crc header file out of the blackfin folder.
bf54x: drop unuesd HOST status,control,timeout registers bit define macros
blackfin: portmux: cleanup head file
Blackfin: remove "config IP_CHECKSUM_L1"
blackfin: Remove GENERIC_GPIO config option again
blackfin:Use generic /proc/interrupts implementation
blackfin: bf60x: fix typo "CONFIG_PM_BFIN_WAKE_PA15_POL"
Linus Torvalds [Sun, 13 Apr 2014 00:23:12 +0000 (17:23 -0700)]
Merge tag 'remoteproc-3.15-cleanups' of git://git./linux/kernel/git/ohad/remoteproc
Pull remoteproc cleanups from Ohad Ben-Cohen:
"Several remoteproc cleanup patches coming from Jingoo Han, Julia
Lawall and Uwe Kleine-König"
* tag 'remoteproc-3.15-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
remoteproc/ste_modem: staticize local symbols
remoteproc/davinci: simplify use of devm_ioremap_resource
remoteproc/davinci: drop needless devm_clk_put
Linus Torvalds [Sun, 13 Apr 2014 00:00:40 +0000 (17:00 -0700)]
Merge tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel
Pull llvm patches from Behan Webster:
"These are some initial updates to support compiling the kernel with
clang.
These patches have been through the proper reviews to the best of my
ability, and have been soaking in linux-next for a few weeks. These
patches by themselves still do not completely allow clang to be used
with the kernel code, but lay the foundation for other patches which
are still under review.
Several other of the LLVMLinux patches have been already added via
maintainer trees"
* tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel:
x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id"
x86 kbuild: LLVMLinux: More cc-options added for clang
x86, acpi: LLVMLinux: Remove nested functions from Thinkpad ACPI
LLVMLinux: Add support for clang to compiler.h and new compiler-clang.h
LLVMLinux: Remove warning about returning an uninitialized variable
kbuild: LLVMLinux: Fix LINUX_COMPILER definition script for compilation with clang
Documentation: LLVMLinux: Update Documentation/dontdiff
kbuild: LLVMLinux: Adapt warnings for compilation with clang
kbuild: LLVMLinux: Add Kbuild support for building kernel with Clang
Linus Torvalds [Sat, 12 Apr 2014 23:51:08 +0000 (16:51 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"Here are the target pending updates for v3.15-rc1. Apologies in
advance for waiting until the second to last day of the merge window
to send these out.
The highlights this round include:
- iser-target support for T10 PI (DIF) offloads (Sagi + Or)
- Fix Task Aborted Status (TAS) handling in target-core (Alex Leung)
- Pass in transport supported PI at session initialization (Sagi + MKP + nab)
- Add WRITE_INSERT + READ_STRIP T10 PI support in target-core (nab + Sagi)
- Fix iscsi-target ERL=2 ASYNC_EVENT connection pointer bug (nab)
- Fix tcm_fc use-after-free of ft_tpg (Andy Grover)
- Use correct ib_sg_dma primitives in ib_isert (Mike Marciniszyn)
Also, note the virtio-scsi + vhost-scsi changes to expose T10 PI
metadata into KVM guest have been left-out for now, as there where a
few comments from MST + Paolo that where not able to be addressed in
time for v3.15. Please expect this feature for v3.16-rc1"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (43 commits)
ib_srpt: Use correct ib_sg_dma primitives
target/tcm_fc: Rename ft_tport_create to ft_tport_get
target/tcm_fc: Rename ft_{add,del}_lport to {add,del}_wwn
target/tcm_fc: Rename structs and list members for clarity
target/tcm_fc: Limit to 1 TPG per wwn
target/tcm_fc: Don't export ft_lport_list
target/tcm_fc: Fix use-after-free of ft_tpg
target: Add check to prevent Abort Task from aborting itself
target: Enable READ_STRIP emulation in target_complete_ok_work
target/sbc: Add sbc_dif_read_strip software emulation
target: Enable WRITE_INSERT emulation in target_execute_cmd
target/sbc: Add sbc_dif_generate software emulation
target/sbc: Only expose PI read_cap16 bits when supported by fabric
target/spc: Only expose PI mode page bits when supported by fabric
target/spc: Only expose PI inquiry bits when supported by fabric
target: Pass in transport supported PI at session initialization
target/iblock: Fix double bioset_integrity_free bug
Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist
target/rd: T10-Dif: RAM disk is allocating more space than required.
iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug
...
Linus Torvalds [Sat, 12 Apr 2014 23:18:17 +0000 (16:18 -0700)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A series of bug fix patches for v3.15-rc1. Most are just driver
fixes. There are some changes at remote controller core level, fixing
some definitions on a new API added for Kernel v3.15.
It also adds the missing include at include/uapi/linux/v4l2-common.h,
to allow its compilation on userspace, as pointed by you"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (24 commits)
[media] gpsca: remove the risk of a division by zero
[media] stk1160: warrant a NUL terminated string
[media] v4l: ti-vpe: retain v4l2_buffer flags for captured buffers
[media] v4l: ti-vpe: Set correct field parameter for output and capture buffers
[media] v4l: ti-vpe: zero out reserved fields in try_fmt
[media] v4l: ti-vpe: Fix initial configuration queue data
[media] v4l: ti-vpe: Use correct bus_info name for the device in querycap
[media] v4l: ti-vpe: report correct capabilities in querycap
[media] v4l: ti-vpe: Allow usage of smaller images
[media] v4l: ti-vpe: Use video_device_release_empty
[media] v4l: ti-vpe: Make sure in job_ready that we have the needed number of dst_bufs
[media] lgdt3305: include sleep functionality in lgdt3304_ops
[media] drx-j: use customise option correctly
[media] m88rs2000: fix sparse static warnings
[media] r820t: fix size and init values
[media] rc-core: remove generic scancode filter
[media] rc-core: split dev->s_filter
[media] rc-core: do not change 32bit NEC scancode format for now
[media] rtl28xxu: remove duplicate ID 0458:707f Genius TVGo DVB-T03
[media] xc2028: add missing break to switch
...
Linus Torvalds [Sat, 12 Apr 2014 23:16:39 +0000 (16:16 -0700)]
Merge tag 'ntb-3.15' of git://github.com/jonmason/ntb
Pull PCIe non-transparent bridge fixes and features from Jon Mason:
"NTB driver bug fixes to address issues in list traversal, skb leak in
ntb_netdev, a typo, and a leak of msix entries in the error path.
Clean ups of the event handling logic, as well as a overall style
cleanup. Finally, the driver was converted to use the new
pci_enable_msix_range logic (and the refactoring to go along with it)"
* tag 'ntb-3.15' of git://github.com/jonmason/ntb:
ntb: Use pci_enable_msix_range() instead of pci_enable_msix()
ntb: Split ntb_setup_msix() into separate BWD/SNB routines
ntb: Use pci_msix_vec_count() to obtain number of MSI-Xs
NTB: Code Style Clean-up
NTB: client event cleanup
ntb: Fix leakage of ntb_device::msix_entries[] array
NTB: Fix typo in setting one translation register
ntb_netdev: Fix skb free issue in open
ntb_netdev: Fix list_for_each_entry exit issue
Linus Torvalds [Sat, 12 Apr 2014 22:39:53 +0000 (15:39 -0700)]
ceph: fix pr_fmt() redefinition
The vfs merge caused a latent bug to show up:
In file included from fs/ceph/super.h:4:0,
from fs/ceph/ioctl.c:3:
include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default]
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
^
In file included from include/linux/kernel.h:13:0,
from include/linux/uio.h:12,
from include/linux/socket.h:7,
from include/uapi/linux/in.h:22,
from include/linux/in.h:23,
from fs/ceph/ioctl.c:1:
include/linux/printk.h:214:0: note: this is the location of the previous definition
#define pr_fmt(fmt) fmt
^
where the reason is that <linux/ceph_debug.h> is included much too late
for the "pr_fmt()" define.
The include of <linux/ceph_debug.h> needs to be the first include in the
file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't
noticeable until some unrelated header file changes brought in an
indirect earlier include of <linux/kernel.h>.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 12 Apr 2014 21:49:50 +0000 (14:49 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"The first vfs pile, with deep apologies for being very late in this
window.
Assorted cleanups and fixes, plus a large preparatory part of iov_iter
work. There's a lot more of that, but it'll probably go into the next
merge window - it *does* shape up nicely, removes a lot of
boilerplate, gets rid of locking inconsistencie between aio_write and
splice_write and I hope to get Kent's direct-io rewrite merged into
the same queue, but some of the stuff after this point is having
(mostly trivial) conflicts with the things already merged into
mainline and with some I want more testing.
This one passes LTP and xfstests without regressions, in addition to
usual beating. BTW, readahead02 in ltp syscalls testsuite has started
giving failures since "mm/readahead.c: fix readahead failure for
memoryless NUMA nodes and limit readahead pages" - might be a false
positive, might be a real regression..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
missing bits of "splice: fix racy pipe->buffers uses"
cifs: fix the race in cifs_writev()
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
kill generic_file_buffered_write()
ocfs2_file_aio_write(): switch to generic_perform_write()
ceph_aio_write(): switch to generic_perform_write()
xfs_file_buffered_aio_write(): switch to generic_perform_write()
export generic_perform_write(), start getting rid of generic_file_buffer_write()
generic_file_direct_write(): get rid of ppos argument
btrfs_file_aio_write(): get rid of ppos
kill the 5th argument of generic_file_buffered_write()
kill the 4th argument of __generic_file_aio_write()
lustre: don't open-code kernel_recvmsg()
ocfs2: don't open-code kernel_recvmsg()
drbd: don't open-code kernel_recvmsg()
constify blk_rq_map_user_iov() and friends
lustre: switch to kernel_sendmsg()
ocfs2: don't open-code kernel_sendmsg()
take iov_iter stuff to mm/iov_iter.c
process_vm_access: tidy up a bit
...
David S. Miller [Sat, 12 Apr 2014 21:03:20 +0000 (17:03 -0400)]
Merge branch 'tunnels'
Nicolas Dichtel says:
====================
tunnels: don't allow to add the same tunnel twice
This series fixes the check of an existing tunnel with the same
parameters when a new tunnel is added. I've checked all users of
ip_tunnel_newlink(): gre, gretap, ipip and vti. The bug exists only
for gre and vti.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Fri, 11 Apr 2014 13:51:19 +0000 (15:51 +0200)]
vti: don't allow to add the same tunnel twice
Before the patch, it was possible to add two times the same tunnel:
ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41
ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41
It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
argument dev->type, which was set only later (when calling ndo_init handler
in register_netdevice()). Let's set this type in the setup handler, which is
called before newlink handler.
Introduced by commit
b9959fd3b0fa ("vti: switch to new ip tunnel code").
CC: Cong Wang <amwang@redhat.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Fri, 11 Apr 2014 13:51:18 +0000 (15:51 +0200)]
gre: don't allow to add the same tunnel twice
Before the patch, it was possible to add two times the same tunnel:
ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249
ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249
It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
argument dev->type, which was set only later (when calling ndo_init handler
in register_netdevice()). Let's set this type in the setup handler, which is
called before newlink handler.
Introduced by commit
c54419321455 ("GRE: Refactor GRE tunneling code.").
CC: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vincenzo Maffione [Sat, 12 Apr 2014 09:55:40 +0000 (11:55 +0200)]
drivers: net: xen-netfront: fix array initialization bug
This patch fixes the initialization of an array used in the TX
datapath that was mistakenly initialized together with the
RX datapath arrays. An out of range array access could happen
when RX and TX rings had different sizes.
Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 12 Apr 2014 20:36:44 +0000 (16:36 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to e1000, e1000e, igb, igbvf, ixgb, ixgbe,
ixgbevf and i40evf.
Mark fixes an issue with ixgbe and ixgbevf by adding a bit to indicate
when workqueues have been initialized. This permits the register read
error handling from attempting to use them prior to that, which also
generates warnings. Checking for a detected removal after initializing
the work queues allows the probe function to return an error without
getting the workqueue involved. Further, if the error_detected
callback is entered before the workqueues are initialized, exit without
recovery since the device initialization was so truncated.
Francois Romieu provides several patches to all the drivers to remove
the open coded skb_cow_head.
Jakub Kicinski provides a fix for igb where last_rx_timestamp should be
updated only when Rx time stamp is read.
Mitch provides a fix for i40evf where a recent change broke the RSS LUT
programming causing it to be programmed with all 0's.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 12 Apr 2014 20:06:10 +0000 (13:06 -0700)]
Merge tag 'trace-3.15-v2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull more tracing updates from Steven Rostedt:
"This includes the final patch to clean up and fix the issue with the
design of tracepoints and how a user could register a tracepoint and
have that tracepoint not be activated but no error was shown.
The design was for an out of tree module but broke in tree users. The
clean up was to remove the saving of the hash table of tracepoint
names such that they can be enabled before they exist (enabling a
module tracepoint before that module is loaded). This added more
complexity than needed. The clean up was to remove that code and just
enable tracepoints that exist or fail if they do not.
This removed a lot of code as well as the complexity that it brought.
As a side effect, instead of registering a tracepoint by its name, the
tracepoint needs to be registered with the tracepoint descriptor.
This removes having to duplicate the tracepoint names that are
enabled.
The second patch was added that simplified the way modules were
searched for.
This cleanup required changes that were in the 3.15 queue as well as
some changes that were added late in the 3.14-rc cycle. This final
change waited till the two were merged in upstream and then the change
was added and full tests were run. Unfortunately, the test found some
errors, but after it was already submitted to the for-next branch and
not to be rebased. Sparse errors were detected by Fengguang Wu's bot
tests, and my internal tests discovered that the anonymous union
initialization triggered a bug in older gcc compilers. Luckily, there
was a bugzilla for the gcc bug which gave a work around to the
problem. The third and fourth patch handled the sparse error and the
gcc bug respectively.
A final patch was tagged along to fix a missing documentation for the
README file"
* tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add missing function triggers dump and cpudump to README
tracing: Fix anonymous unions in struct ftrace_event_call
tracepoint: Fix sparse warnings in tracepoint.c
tracepoint: Simplify tracepoint module search
tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints
Linus Torvalds [Sat, 12 Apr 2014 19:38:53 +0000 (12:38 -0700)]
Merge git://git.infradead.org/users/eparis/audit
Pull audit updates from Eric Paris.
* git://git.infradead.org/users/eparis/audit: (28 commits)
AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
audit: do not cast audit_rule_data pointers pointlesly
AUDIT: Allow login in non-init namespaces
audit: define audit_is_compat in kernel internal header
kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
sched: declare pid_alive as inline
audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
syscall_get_arch: remove useless function arguments
audit: remove stray newline from audit_log_execve_info() audit_panic() call
audit: remove stray newlines from audit_log_lost messages
audit: include subject in login records
audit: remove superfluous new- prefix in AUDIT_LOGIN messages
audit: allow user processes to log from another PID namespace
audit: anchor all pid references in the initial pid namespace
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t ppid of task in init_pid_ns
audit: rename the misleading audit_get_context() to audit_take_context()
audit: Add generic compat syscall support
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
...
Al Viro [Fri, 11 Apr 2014 16:01:03 +0000 (12:01 -0400)]
missing bits of "splice: fix racy pipe->buffers uses"
that commit has fixed only the parts of that mess in fs/splice.c itself;
there had been more in several other ->splice_read() instances...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 3 Apr 2014 14:27:17 +0000 (10:27 -0400)]
cifs: fix the race in cifs_writev()
O_APPEND handling there hadn't been completely fixed by Pavel's
patch; it checks the right value, but it's racy - we can't really
do that until i_mutex has been taken.
Fix by switching to __generic_file_aio_write() (open-coding
generic_file_aio_write(), actually) and pulling mutex_lock() above
inode_size_read().
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 4 Apr 2014 02:44:19 +0000 (22:44 -0400)]
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
ceph_osdc_put_request(ERR_PTR(-error)) oopses. What we want there
is break, not goto out.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Daniel Borkmann [Fri, 11 Apr 2014 11:22:00 +0000 (13:22 +0200)]
pktgen: be friendly to LLTX devices
Similarly to commit
43279500deca ("packet: respect devices with
LLTX flag in direct xmit"), we can basically apply the very same
to pktgen. This will help testing against LLTX devices such as
dummy driver (or others), which only have a single netdevice txq
and would otherwise require locking their txq from pktgen side
while e.g. in dummy case, we would not need any locking. Fix this
by making use of HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will
be respected.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Fri, 11 Apr 2014 09:54:31 +0000 (17:54 +0800)]
r8152: check RTL8152_UNPLUG
When the device is unplugged, the driver would try to disable the
device. Add checking the flag of RTL8152_UNPLUG to skip setting
the device when it is unplugged. This could shorten the time of
unloading the driver.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Zyngier [Fri, 11 Apr 2014 09:46:17 +0000 (10:46 +0100)]
net: sun4i-emac: add promiscuous support
The sun4i-emac driver is rather primitive, and doesn't support
promiscuous mode. This makes usage such as bridging impossible,
which is a shame on virtualization capable HW such as the
Allwinner A20.
The fix is fairly simple: move the RX setup code to the ndo_set_rx_mode
vector, and add the required HW configuration when IFF_PROMISC is passed
by the core code.
This has been tested on a generic A20 box running a few virtual
machines hanging off a bridge with the EMAC chip as the link to the
outside world.
Cc: Stefan Roese <sr@denx.de>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Stefan Roese <sr@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Duan Jiong [Fri, 11 Apr 2014 08:37:37 +0000 (16:37 +0800)]
net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
This patch fixes coccinelle error regarding usage of IS_ERR and
PTR_ERR instead of PTR_ERR_OR_ZERO.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steven Miao [Fri, 11 Apr 2014 18:07:27 +0000 (02:07 +0800)]
blackfin: cleanup board files
using IS_ENABLED() macro instead of defined(CONFIG_XXX) || defined(CONFIG_XXX_MODULE)
Signed-off-by: Steven Miao <realmz6@gmail.com>