xprtrdma: Fix latency regression on NUMA NFS/RDMA clients
authorChuck Lever <chuck.lever@oracle.com>
Wed, 28 Feb 2018 20:30:27 +0000 (15:30 -0500)
committerAnna Schumaker <Anna.Schumaker@Netapp.com>
Tue, 10 Apr 2018 20:06:22 +0000 (16:06 -0400)
With v4.15, on one of my NFS/RDMA clients I measured a nearly
doubling in the latency of small read and write system calls. There
was no change in server round trip time. The extra latency appears
in the whole RPC execution path.

"git bisect" settled on commit ccede7598588 ("xprtrdma: Spread reply
processing over more CPUs") .

After some experimentation, I found that leaving the WQ bound and
allowing the scheduler to pick the dispatch CPU seems to eliminate
the long latencies, and it does not introduce any new regressions.

The fix is implemented by reverting only the part of
commit ccede7598588 ("xprtrdma: Spread reply processing over more
CPUs") that dispatches RPC replies specifically on the CPU where the
matching RPC call was made.

Interestingly, saving the CPU number and later queuing reply
processing there was effective _only_ for a NFS READ and WRITE
request. On my NUMA client, in-kernel RPC reply processing for
asynchronous RPCs was dispatched on the same CPU where the RPC call
was made, as expected. However synchronous RPCs seem to get their
reply dispatched on some other CPU than where the call was placed,
every time.

Fixes: ccede7598588 ("xprtrdma: Spread reply processing over ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
net/sunrpc/xprtrdma/rpc_rdma.c
net/sunrpc/xprtrdma/transport.c
net/sunrpc/xprtrdma/xprt_rdma.h

index f0855a959a278dac5c8e6459cd94df244f87ee2c..4bc0f4d94a0168e4dadd970a6049b16a1c466008 100644 (file)
@@ -1366,7 +1366,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep)
 
        trace_xprtrdma_reply(rqst->rq_task, rep, req, credits);
 
-       queue_work_on(req->rl_cpu, rpcrdma_receive_wq, &rep->rr_work);
+       queue_work(rpcrdma_receive_wq, &rep->rr_work);
        return;
 
 out_badstatus:
index 4b1ecfe979cf4d5b0df54679fdd127f7b24847f8..f86021e3b85375f1d79a1c1e1a52be870f5d8830 100644 (file)
@@ -52,7 +52,6 @@
 #include <linux/slab.h>
 #include <linux/seq_file.h>
 #include <linux/sunrpc/addr.h>
-#include <linux/smp.h>
 
 #include "xprt_rdma.h"
 
@@ -651,7 +650,6 @@ xprt_rdma_allocate(struct rpc_task *task)
        if (!rpcrdma_get_recvbuf(r_xprt, req, rqst->rq_rcvsize, flags))
                goto out_fail;
 
-       req->rl_cpu = smp_processor_id();
        req->rl_connect_cookie = 0;     /* our reserved value */
        rpcrdma_set_xprtdata(rqst, req);
        rqst->rq_buffer = req->rl_sendbuf->rg_base;
index 69883a960a3ffbebcc1cfc4c400ab1879f4a0c90..430a6de8300e50849514bc22e6c110c93fed3675 100644 (file)
@@ -334,7 +334,6 @@ enum {
 struct rpcrdma_buffer;
 struct rpcrdma_req {
        struct list_head        rl_list;
-       int                     rl_cpu;
        unsigned int            rl_connect_cookie;
        struct rpcrdma_buffer   *rl_buffer;
        struct rpcrdma_rep      *rl_reply;