2cea1ebe2422175ff614546fdac2bcd7fb3e3c52
[openwrt/staging/ansuel.git] /
1 From: Pablo Neira Ayuso <pablo@netfilter.org>
2 Date: Wed, 24 Mar 2021 02:30:55 +0100
3 Subject: [PATCH] docs: nf_flowtable: update documentation with
4 enhancements
5
6 This patch updates the flowtable documentation to describe recent
7 enhancements:
8
9 - Offload action is available after the first packets go through the
10 classic forwarding path.
11 - IPv4 and IPv6 are supported. Only TCP and UDP layer 4 are supported at
12 this stage.
13 - Tuple has been augmented to track VLAN id and PPPoE session id.
14 - Bridge and IP forwarding integration, including bridge VLAN filtering
15 support.
16 - Hardware offload support.
17 - Describe the [OFFLOAD] and [HW_OFFLOAD] tags in the conntrack table
18 listing.
19 - Replace 'flow offload' by 'flow add' in example rulesets (preferred
20 syntax).
21 - Describe existing cache limitations.
22
23 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
24 ---
25
26 --- a/Documentation/networking/nf_flowtable.rst
27 +++ b/Documentation/networking/nf_flowtable.rst
28 @@ -4,35 +4,38 @@
29 Netfilter's flowtable infrastructure
30 ====================================
31
32 -This documentation describes the software flowtable infrastructure available in
33 -Netfilter since Linux kernel 4.16.
34 +This documentation describes the Netfilter flowtable infrastructure which allows
35 +you to define a fastpath through the flowtable datapath. This infrastructure
36 +also provides hardware offload support. The flowtable supports for the layer 3
37 +IPv4 and IPv6 and the layer 4 TCP and UDP protocols.
38
39 Overview
40 --------
41
42 -Initial packets follow the classic forwarding path, once the flow enters the
43 -established state according to the conntrack semantics (ie. we have seen traffic
44 -in both directions), then you can decide to offload the flow to the flowtable
45 -from the forward chain via the 'flow offload' action available in nftables.
46 -
47 -Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the
48 -output netdevice via neigh_xmit(), hence, they bypass the classic forwarding
49 -path (the visible effect is that you do not see these packets from any of the
50 -netfilter hooks coming after the ingress). In case of flowtable miss, the packet
51 -follows the classic forward path.
52 -
53 -The flowtable uses a resizable hashtable, lookups are based on the following
54 -7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source
55 -and destination ports and the input interface (useful in case there are several
56 -conntrack zones in place).
57 -
58 -Flowtables are populated via the 'flow offload' nftables action, so the user can
59 -selectively specify what flows are placed into the flow table. Hence, packets
60 -follow the classic forwarding path unless the user explicitly instruct packets
61 -to use this new alternative forwarding path via nftables policy.
62 +Once the first packet of the flow successfully goes through the IP forwarding
63 +path, from the second packet on, you might decide to offload the flow to the
64 +flowtable through your ruleset. The flowtable infrastructure provides a rule
65 +action that allows you to specify when to add a flow to the flowtable.
66 +
67 +A packet that finds a matching entry in the flowtable (ie. flowtable hit) is
68 +transmitted to the output netdevice via neigh_xmit(), hence, packets bypass the
69 +classic IP forwarding path (the visible effect is that you do not see these
70 +packets from any of the Netfilter hooks coming after ingress). In case that
71 +there is no matching entry in the flowtable (ie. flowtable miss), the packet
72 +follows the classic IP forwarding path.
73 +
74 +The flowtable uses a resizable hashtable. Lookups are based on the following
75 +n-tuple selectors: layer 2 protocol encapsulation (VLAN and PPPoE), layer 3
76 +source and destination, layer 4 source and destination ports and the input
77 +interface (useful in case there are several conntrack zones in place).
78 +
79 +The 'flow add' action allows you to populate the flowtable, the user selectively
80 +specifies what flows are placed into the flowtable. Hence, packets follow the
81 +classic IP forwarding path unless the user explicitly instruct flows to use this
82 +new alternative forwarding path via policy.
83
84 -This is represented in Fig.1, which describes the classic forwarding path
85 -including the Netfilter hooks and the flowtable fastpath bypass.
86 +The flowtable datapath is represented in Fig.1, which describes the classic IP
87 +forwarding path including the Netfilter hooks and the flowtable fastpath bypass.
88
89 ::
90
91 @@ -67,11 +70,13 @@ including the Netfilter hooks and the fl
92 Fig.1 Netfilter hooks and flowtable interactions
93
94 The flowtable entry also stores the NAT configuration, so all packets are
95 -mangled according to the NAT policy that matches the initial packets that went
96 -through the classic forwarding path. The TTL is decremented before calling
97 -neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding
98 -path given that the transport selectors are missing, therefore flowtable lookup
99 -is not possible.
100 +mangled according to the NAT policy that is specified from the classic IP
101 +forwarding path. The TTL is decremented before calling neigh_xmit(). Fragmented
102 +traffic is passed up to follow the classic IP forwarding path given that the
103 +transport header is missing, in this case, flowtable lookups are not possible.
104 +TCP RST and FIN packets are also passed up to the classic IP forwarding path to
105 +release the flow gracefully. Packets that exceed the MTU are also passed up to
106 +the classic forwarding path to report packet-too-big ICMP errors to the sender.
107
108 Example configuration
109 ---------------------
110 @@ -85,7 +90,7 @@ flowtable and add one rule to your forwa
111 }
112 chain y {
113 type filter hook forward priority 0; policy accept;
114 - ip protocol tcp flow offload @f
115 + ip protocol tcp flow add @f
116 counter packets 0 bytes 0
117 }
118 }
119 @@ -103,6 +108,117 @@ flow is offloaded, you will observe that
120 does not get updated for the packets that are being forwarded through the
121 forwarding bypass.
122
123 +You can identify offloaded flows through the [OFFLOAD] tag when listing your
124 +connection tracking table.
125 +
126 +::
127 + # conntrack -L
128 + tcp 6 src=10.141.10.2 dst=192.168.10.2 sport=52728 dport=5201 src=192.168.10.2 dst=192.168.10.1 sport=5201 dport=52728 [OFFLOAD] mark=0 use=2
129 +
130 +
131 +Layer 2 encapsulation
132 +---------------------
133 +
134 +Since Linux kernel 5.13, the flowtable infrastructure discovers the real
135 +netdevice behind VLAN and PPPoE netdevices. The flowtable software datapath
136 +parses the VLAN and PPPoE layer 2 headers to extract the ethertype and the
137 +VLAN ID / PPPoE session ID which are used for the flowtable lookups. The
138 +flowtable datapath also deals with layer 2 decapsulation.
139 +
140 +You do not need to add the PPPoE and the VLAN devices to your flowtable,
141 +instead the real device is sufficient for the flowtable to track your flows.
142 +
143 +Bridge and IP forwarding
144 +------------------------
145 +
146 +Since Linux kernel 5.13, you can add bridge ports to the flowtable. The
147 +flowtable infrastructure discovers the topology behind the bridge device. This
148 +allows the flowtable to define a fastpath bypass between the bridge ports
149 +(represented as eth1 and eth2 in the example figure below) and the gateway
150 +device (represented as eth0) in your switch/router.
151 +
152 +::
153 + fastpath bypass
154 + .-------------------------.
155 + / \
156 + | IP forwarding |
157 + | / \ \/
158 + | br0 eth0 ..... eth0
159 + . / \ *host B*
160 + -> eth1 eth2
161 + . *switch/router*
162 + .
163 + .
164 + eth0
165 + *host A*
166 +
167 +The flowtable infrastructure also supports for bridge VLAN filtering actions
168 +such as PVID and untagged. You can also stack a classic VLAN device on top of
169 +your bridge port.
170 +
171 +If you would like that your flowtable defines a fastpath between your bridge
172 +ports and your IP forwarding path, you have to add your bridge ports (as
173 +represented by the real netdevice) to your flowtable definition.
174 +
175 +Counters
176 +--------
177 +
178 +The flowtable can synchronize packet and byte counters with the existing
179 +connection tracking entry by specifying the counter statement in your flowtable
180 +definition, e.g.
181 +
182 +::
183 + table inet x {
184 + flowtable f {
185 + hook ingress priority 0; devices = { eth0, eth1 };
186 + counter
187 + }
188 + ...
189 + }
190 +
191 +Counter support is available since Linux kernel 5.7.
192 +
193 +Hardware offload
194 +----------------
195 +
196 +If your network device provides hardware offload support, you can turn it on by
197 +means of the 'offload' flag in your flowtable definition, e.g.
198 +
199 +::
200 + table inet x {
201 + flowtable f {
202 + hook ingress priority 0; devices = { eth0, eth1 };
203 + flags offload;
204 + }
205 + ...
206 + }
207 +
208 +There is a workqueue that adds the flows to the hardware. Note that a few
209 +packets might still run over the flowtable software path until the workqueue has
210 +a chance to offload the flow to the network device.
211 +
212 +You can identify hardware offloaded flows through the [HW_OFFLOAD] tag when
213 +listing your connection tracking table. Please, note that the [OFFLOAD] tag
214 +refers to the software offload mode, so there is a distinction between [OFFLOAD]
215 +which refers to the software flowtable fastpath and [HW_OFFLOAD] which refers
216 +to the hardware offload datapath being used by the flow.
217 +
218 +The flowtable hardware offload infrastructure also supports for the DSA
219 +(Distributed Switch Architecture).
220 +
221 +Limitations
222 +-----------
223 +
224 +The flowtable behaves like a cache. The flowtable entries might get stale if
225 +either the destination MAC address or the egress netdevice that is used for
226 +transmission changes.
227 +
228 +This might be a problem if:
229 +
230 +- You run the flowtable in software mode and you combine bridge and IP
231 + forwarding in your setup.
232 +- Hardware offload is enabled.
233 +
234 More reading
235 ------------
236