git.openwrt.org Git - openwrt/staging/blogic.git/commit

powerpc/powernv/npu: Reduce eieio usage when issuing ATSD invalidates

There are two types of ATSDs issued to the NPU: invalidates targeting a
specific virtual address and invalidates targeting the whole address
space. In both cases prior to this change, the sequence was:

    for each NPU
        - Write the target address to the XTS_ATSD_AVA register
        - EIEIO
        - Write the launch value to issue the ATSD

First, a target address is not required when invalidating the whole
address space, so that write and the EIEIO have been removed. The AP
(size) field in the launch is not needed either.

Second, for per-address invalidates the above sequence is inefficient in
the common case of multiple NPUs because an EIEIO is issued per NPU. This
unnecessarily forces the launches of later ATSDs to be ordered with the
launches of earlier ones. The new sequence only issues a single EIEIO:

    for each NPU
        - Write the target address to the XTS_ATSD_AVA register
    EIEIO
    for each NPU
        - Write the launch value to issue the ATSD

Performance results were gathered using a microbenchmark which creates a
1G allocation then uses mprotect with PROT_NONE to trigger invalidates in
strides across the allocation.

With only a single NPU active (one GPU) the difference is in the noise for
both types of invalidates (+/-1%).

With two NPUs active (on a 6-GPU system) the effect is more noticeable:

         mprotect rate (GB/s)
Stride   Before      After      Speedup
64K         5.9        6.5          10%
1M         31.2       33.4           7%
2M         36.3       38.7           7%
4M        322.6      356.7          11%

Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

author	Mark Hairgrove <mhairgrove@nvidia.com>
	Wed, 3 Oct 2018 18:51:32 +0000 (11:51 -0700)
committer	Michael Ellerman <mpe@ellerman.id.au>
	Thu, 4 Oct 2018 06:55:52 +0000 (16:55 +1000)
commit	7ead15a1442b25e12a6f0791a7c7a5a72d1f3a0c
tree	018304e727b07818a868ceea606969beb615f95c	tree \| snapshot
parent	bad96de8d31ba65dc26645af5550135315ea0b19	commit \| diff