From: David S. Miller Date: Mon, 2 Jul 2018 00:06:24 +0000 (+0900) Subject: Merge branch 'xps-symmretric-queue-selection' X-Git-Url: http://git.cdn.openwrt.org/?a=commitdiff_plain;h=97680ade43dc6a8ad2451389d66b0492458196d4;p=openwrt%2Fstaging%2Fblogic.git Merge branch 'xps-symmretric-queue-selection' Amritha Nambiar says: ==================== Symmetric queue selection using XPS for Rx queues This patch series implements support for Tx queue selection based on Rx queue(s) map. This is done by configuring Rx queue(s) map per Tx-queue using sysfs attribute. If the user configuration for Rx queues does not apply, then the Tx queue selection falls back to XPS using CPUs and finally to hashing. XPS is refactored to support Tx queue selection based on either the CPUs map or the Rx-queues map. The config option CONFIG_XPS needs to be enabled. By default no receive queues are configured for the Tx queue. - /sys/class/net//queues/tx-*/xps_rxqs A set of receive queues can be mapped to a set of transmit queues (many:many), although the common use case is a 1:1 mapping. This will enable sending packets on the same Tx-Rx queue association as this is useful for busy polling multi-threaded workloads where it is not possible to pin the threads to a CPU. This is a rework of Sridhar's patch for symmetric queueing via socket option: https://www.spinics.net/lists/netdev/msg453106.html Testing Hints: Kernel: Linux 4.17.0-rc7+ Interface: driver: ixgbe version: 5.1.0-k firmware-version: 0x00015e0b Configuration: ethtool -L $iface combined 16 ethtool -C $iface rx-usecs 1000 sysctl net.core.busy_poll=1000 ATR disabled: ethtool -K $iface ntuple on Workload: Modified memcached that changes the thread selection policy to be based on the incoming rx-queue of a connection using SO_INCOMING_NAPI_ID socket option. The default is round-robin. Default: No rxqs_map configured Symmetric queues: Enable rxqs_map for all queues 1:1 mapped to Tx queue System: Architecture: x86_64 CPU(s): 72 Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz 16 threads 400K requests/sec ============================= ------------------------------------------------------------------------------- Default Symmetric queues ------------------------------------------------------------------------------- RTT min/avg/max 4/51/2215 2/30/5163 (usec) intr/sec 26655 18606 contextswitch/sec 5145 4044 insn per cycle 0.43 0.72 cache-misses 6.919 4.310 (% of all cache refs) L1-dcache-load- 4.49 3.29 -misses (% of all L1-dcache hits) LLC-load-misses 13.26 8.96 (% of all LL-cache hits) ------------------------------------------------------------------------------- 32 threads 400K requests/sec ============================= ------------------------------------------------------------------------------- Default Symmetric queues ------------------------------------------------------------------------------- RTT min/avg/max 10/112/5562 9/46/4637 (usec) intr/sec 30456 27666 contextswitch/sec 7552 5133 insn per cycle 0.41 0.49 cache-misses 9.357 2.769 (% of all cache refs) L1-dcache-load- 4.09 3.98 -misses (% of all L1-dcache hits) LLC-load-misses 12.96 3.96 (% of all LL-cache hits) ------------------------------------------------------------------------------- 16 threads 800K requests/sec ============================= ------------------------------------------------------------------------------- Default Symmetric queues ------------------------------------------------------------------------------- RTT min/avg/max 5/151/4989 9/69/2611 (usec) intr/sec 35686 22907 contextswitch/sec 25522 12281 insn per cycle 0.67 0.74 cache-misses 8.652 6.38 (% of all cache refs) L1-dcache-load- 3.19 2.86 -misses (% of all L1-dcache hits) LLC-load-misses 16.53 11.99 (% of all LL-cache hits) ------------------------------------------------------------------------------- 32 threads 800K requests/sec ============================= ------------------------------------------------------------------------------- Default Symmetric queues ------------------------------------------------------------------------------- RTT min/avg/max 6/163/6152 8/88/4209 (usec) intr/sec 47079 26548 contextswitch/sec 42190 39168 insn per cycle 0.45 0.54 cache-misses 8.798 4.668 (% of all cache refs) L1-dcache-load- 6.55 6.29 -misses (% of all L1-dcache hits) LLC-load-misses 13.91 10.44 (% of all LL-cache hits) ------------------------------------------------------------------------------- v6: - Changed the names of some functions to begin with net_if. - Cleaned up sk_tx_queue_set/sk_rx_queue_set functions. - Added sk_rx_queue_clear to make it consistent with tx_queue_mapping initialization. ==================== Signed-off-by: David S. Miller --- 97680ade43dc6a8ad2451389d66b0492458196d4