md: batch flush requests.
Currently if many flush requests are submitted to an md device is quick
succession, they are serialized and can take a long to process them all.
We don't really need to call flush all those times - a single flush call
can satisfy all requests submitted before it started.
So keep track of when the current flush started and when it finished,
allow any pending flush that was requested before the flush started
to complete without waiting any more.
Test results from Xiao:
Test is done on a raid10 device which is created by 4 SSDs. The tool is
dbench.
1. The latest linux stable kernel
Operation Count AvgLat MaxLat
--------------------------------------------------
Deltree 768 10.509 78.305
Flush
2078376 0.013 10.094
Close
21787697 0.019 18.821
LockX 96580 0.007 3.184
Mkdir 384 0.008 0.062
Rename
1255883 0.191 23.534
ReadX
46495589 0.020 14.230
WriteX
14790591 7.123 60.706
Unlink
5989118 0.440 54.551
UnlockX 96580 0.005 2.736
FIND_FIRST
10393845 0.042 12.079
SET_FILE_INFORMATION
2415558 0.129 10.088
QUERY_FILE_INFORMATION
4711725 0.005 8.462
QUERY_PATH_INFORMATION
26883327 0.032 21.715
QUERY_FS_INFORMATION
4929409 0.010 8.238
NTCreateX
29660080 0.100 53.268
Throughput 1034.88 MB/sec (sync open) 128 clients 128 procs
max_latency=60.712 ms
2. With patch1 "Revert "MD: fix lock contention for flush bios""
Operation Count AvgLat MaxLat
--------------------------------------------------
Deltree 256 8.326 36.761
Flush 693291 3.974 180.269
Close
7266404 0.009 36.929
LockX 32160 0.006 0.840
Mkdir 128 0.008 0.021
Rename 418755 0.063 29.945
ReadX
15498708 0.007 7.216
WriteX
4932310 22.482 267.928
Unlink
1997557 0.109 47.553
UnlockX 32160 0.004 1.110
FIND_FIRST
3465791 0.036 7.320
SET_FILE_INFORMATION 805825 0.015 1.561
QUERY_FILE_INFORMATION
1570950 0.005 2.403
QUERY_PATH_INFORMATION
8965483 0.013 14.277
QUERY_FS_INFORMATION
1643626 0.009 3.314
NTCreateX
9892174 0.061 41.278
Throughput 345.009 MB/sec (sync open) 128 clients 128 procs
max_latency=267.939 m
3. With patch1 and patch2
Operation Count AvgLat MaxLat
--------------------------------------------------
Deltree 768 9.570 54.588
Flush
2061354 0.666 15.102
Close
21604811 0.012 25.697
LockX 95770 0.007 1.424
Mkdir 384 0.008 0.053
Rename
1245411 0.096 12.263
ReadX
46103198 0.011 12.116
WriteX
14667988 7.375 60.069
Unlink
5938936 0.173 30.905
UnlockX 95770 0.005 4.147
FIND_FIRST
10306407 0.041 11.715
SET_FILE_INFORMATION
2395987 0.048 7.640
QUERY_FILE_INFORMATION
4672371 0.005 9.291
QUERY_PATH_INFORMATION
26656735 0.018 19.719
QUERY_FS_INFORMATION
4887940 0.010 7.654
NTCreateX
29410811 0.059 28.551
Throughput 1026.21 MB/sec (sync open) 128 clients 128 procs
max_latency=60.075 ms
Cc: <stable@vger.kernel.org> # v4.19+
Tested-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>