Documentation/networking: add checksum-offloads.txt to explain LCO
Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
parent
6fa79666e2
commit
e8ae7b000e
|
@ -44,6 +44,8 @@ can.txt
|
||||||
- documentation on CAN protocol family.
|
- documentation on CAN protocol family.
|
||||||
cdc_mbim.txt
|
cdc_mbim.txt
|
||||||
- 3G/LTE USB modem (Mobile Broadband Interface Model)
|
- 3G/LTE USB modem (Mobile Broadband Interface Model)
|
||||||
|
checksum-offloads.txt
|
||||||
|
- Explanation of checksum offloads; LCO, RCO
|
||||||
cops.txt
|
cops.txt
|
||||||
- info on the COPS LocalTalk Linux driver
|
- info on the COPS LocalTalk Linux driver
|
||||||
cs89x0.txt
|
cs89x0.txt
|
||||||
|
|
119
Documentation/networking/checksum-offloads.txt
Normal file
119
Documentation/networking/checksum-offloads.txt
Normal file
|
@ -0,0 +1,119 @@
|
||||||
|
Checksum Offloads in the Linux Networking Stack
|
||||||
|
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
|
This document describes a set of techniques in the Linux networking stack
|
||||||
|
to take advantage of checksum offload capabilities of various NICs.
|
||||||
|
|
||||||
|
The following technologies are described:
|
||||||
|
* TX Checksum Offload
|
||||||
|
* LCO: Local Checksum Offload
|
||||||
|
* RCO: Remote Checksum Offload
|
||||||
|
|
||||||
|
Things that should be documented here but aren't yet:
|
||||||
|
* RX Checksum Offload
|
||||||
|
* CHECKSUM_UNNECESSARY conversion
|
||||||
|
|
||||||
|
|
||||||
|
TX Checksum Offload
|
||||||
|
===================
|
||||||
|
|
||||||
|
The interface for offloading a transmit checksum to a device is explained
|
||||||
|
in detail in comments near the top of include/linux/skbuff.h.
|
||||||
|
In brief, it allows to request the device fill in a single ones-complement
|
||||||
|
checksum defined by the sk_buff fields skb->csum_start and
|
||||||
|
skb->csum_offset. The device should compute the 16-bit ones-complement
|
||||||
|
checksum (i.e. the 'IP-style' checksum) from csum_start to the end of the
|
||||||
|
packet, and fill in the result at (csum_start + csum_offset).
|
||||||
|
Because csum_offset cannot be negative, this ensures that the previous
|
||||||
|
value of the checksum field is included in the checksum computation, thus
|
||||||
|
it can be used to supply any needed corrections to the checksum (such as
|
||||||
|
the sum of the pseudo-header for UDP or TCP).
|
||||||
|
This interface only allows a single checksum to be offloaded. Where
|
||||||
|
encapsulation is used, the packet may have multiple checksum fields in
|
||||||
|
different header layers, and the rest will have to be handled by another
|
||||||
|
mechanism such as LCO or RCO.
|
||||||
|
No offloading of the IP header checksum is performed; it is always done in
|
||||||
|
software. This is OK because when we build the IP header, we obviously
|
||||||
|
have it in cache, so summing it isn't expensive. It's also rather short.
|
||||||
|
The requirements for GSO are more complicated, because when segmenting an
|
||||||
|
encapsulated packet both the inner and outer checksums may need to be
|
||||||
|
edited or recomputed for each resulting segment. See the skbuff.h comment
|
||||||
|
(section 'E') for more details.
|
||||||
|
|
||||||
|
A driver declares its offload capabilities in netdev->hw_features; see
|
||||||
|
Documentation/networking/netdev-features for more. Note that a device
|
||||||
|
which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start
|
||||||
|
and csum_offset given in the SKB; if it tries to deduce these itself in
|
||||||
|
hardware (as some NICs do) the driver should check that the values in the
|
||||||
|
SKB match those which the hardware will deduce, and if not, fall back to
|
||||||
|
checksumming in software instead (with skb_checksum_help or one of the
|
||||||
|
skb_csum_off_chk* functions as mentioned in include/linux/skbuff.h). This
|
||||||
|
is a pain, but that's what you get when hardware tries to be clever.
|
||||||
|
|
||||||
|
The stack should, for the most part, assume that checksum offload is
|
||||||
|
supported by the underlying device. The only place that should check is
|
||||||
|
validate_xmit_skb(), and the functions it calls directly or indirectly.
|
||||||
|
That function compares the offload features requested by the SKB (which
|
||||||
|
may include other offloads besides TX Checksum Offload) and, if they are
|
||||||
|
not supported or enabled on the device (determined by netdev->features),
|
||||||
|
performs the corresponding offload in software. In the case of TX
|
||||||
|
Checksum Offload, that means calling skb_checksum_help(skb).
|
||||||
|
|
||||||
|
|
||||||
|
LCO: Local Checksum Offload
|
||||||
|
===========================
|
||||||
|
|
||||||
|
LCO is a technique for efficiently computing the outer checksum of an
|
||||||
|
encapsulated datagram when the inner checksum is due to be offloaded.
|
||||||
|
The ones-complement sum of a correctly checksummed TCP or UDP packet is
|
||||||
|
equal to the sum of the pseudo header, because everything else gets
|
||||||
|
'cancelled out' by the checksum field. This is because the sum was
|
||||||
|
complemented before being written to the checksum field.
|
||||||
|
More generally, this holds in any case where the 'IP-style' ones complement
|
||||||
|
checksum is used, and thus any checksum that TX Checksum Offload supports.
|
||||||
|
That is, if we have set up TX Checksum Offload with a start/offset pair, we
|
||||||
|
know that _after the device has filled in that checksum_, the ones
|
||||||
|
complement sum from csum_start to the end of the packet will be equal to
|
||||||
|
_whatever value we put in the checksum field beforehand_. This allows us
|
||||||
|
to compute the outer checksum without looking at the payload: we simply
|
||||||
|
stop summing when we get to csum_start, then add the 16-bit word at
|
||||||
|
(csum_start + csum_offset).
|
||||||
|
Then, when the true inner checksum is filled in (either by hardware or by
|
||||||
|
skb_checksum_help()), the outer checksum will become correct by virtue of
|
||||||
|
the arithmetic.
|
||||||
|
|
||||||
|
LCO is performed by the stack when constructing an outer UDP header for an
|
||||||
|
encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for
|
||||||
|
the IPv6 equivalents, in udp6_set_csum().
|
||||||
|
It is also performed when constructing an IPv4 GRE header, in
|
||||||
|
net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when
|
||||||
|
constructing an IPv6 GRE header; the GRE checksum is computed over the
|
||||||
|
whole packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be
|
||||||
|
possible to use LCO here as IPv6 GRE still uses an IP-style checksum.
|
||||||
|
All of the LCO implementations use a helper function lco_csum(), in
|
||||||
|
include/linux/skbuff.h.
|
||||||
|
|
||||||
|
LCO can safely be used for nested encapsulations; in this case, the outer
|
||||||
|
encapsulation layer will sum over both its own header and the 'middle'
|
||||||
|
header. This does mean that the 'middle' header will get summed multiple
|
||||||
|
times, but there doesn't seem to be a way to avoid that without incurring
|
||||||
|
bigger costs (e.g. in SKB bloat).
|
||||||
|
|
||||||
|
|
||||||
|
RCO: Remote Checksum Offload
|
||||||
|
============================
|
||||||
|
|
||||||
|
RCO is a technique for eliding the inner checksum of an encapsulated
|
||||||
|
datagram, allowing the outer checksum to be offloaded. It does, however,
|
||||||
|
involve a change to the encapsulation protocols, which the receiver must
|
||||||
|
also support. For this reason, it is disabled by default.
|
||||||
|
RCO is detailed in the following Internet-Drafts:
|
||||||
|
https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
|
||||||
|
https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
|
||||||
|
In Linux, RCO is implemented individually in each encapsulation protocol,
|
||||||
|
and most tunnel types have flags controlling its use. For instance, VXLAN
|
||||||
|
has the flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that
|
||||||
|
RCO should be used when transmitting to a given remote destination.
|
|
@ -3705,6 +3705,8 @@ static inline unsigned int skb_gso_network_seglen(const struct sk_buff *skb)
|
||||||
/* Local Checksum Offload.
|
/* Local Checksum Offload.
|
||||||
* Compute outer checksum based on the assumption that the
|
* Compute outer checksum based on the assumption that the
|
||||||
* inner checksum will be offloaded later.
|
* inner checksum will be offloaded later.
|
||||||
|
* See Documentation/networking/checksum-offloads.txt for
|
||||||
|
* explanation of how this works.
|
||||||
* Fill in outer checksum adjustment (e.g. with sum of outer
|
* Fill in outer checksum adjustment (e.g. with sum of outer
|
||||||
* pseudo-header) before calling.
|
* pseudo-header) before calling.
|
||||||
* Also ensure that inner checksum is in linear data area.
|
* Also ensure that inner checksum is in linear data area.
|
||||||
|
|
Loading…
Reference in New Issue
Block a user