kernel_optimize_test/net
Lawrence Brakmo 9aee400061 tcp: ack immediately when a cwr packet arrives
We observed high 99 and 99.9% latencies when doing RPCs with DCTCP. The
problem is triggered when the last packet of a request arrives CE
marked. The reply will carry the ECE mark causing TCP to shrink its cwnd
to 1 (because there are no packets in flight). When the 1st packet of
the next request arrives, the ACK was sometimes delayed even though it
is CWR marked, adding up to 40ms to the RPC latency.

This patch insures that CWR marked data packets arriving will be acked
immediately.

Packetdrill script to reproduce the problem:

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001

0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001

0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 2:3(1) ack 2001

0.200 < [ect0] . 2001:3001(1000) ack 3 win 257
0.200 < [ect0] . 3001:4001(1000) ack 3 win 257
0.200 > [ect01] . 3:3(0) ack 4001

0.210 < [ce] P. 4001:4501(500) ack 3 win 257

+0.001 read(4, ..., 4500) = 4500
+0 write(4, ..., 1) = 1
+0 > [ect01] PE. 3:4(1) ack 4501

+0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257
// Previously the ACK sequence below would be 4501, causing a long RTO
+0.040~+0.045 > [ect01] . 4:4(0) ack 5501   // delayed ack

+0.311 < [ect0] . 5501:6501(1000) ack 4 win 257  // More data
+0 > [ect01] . 4:4(0) ack 6501     // now acks everything

+0.500 < F. 9501:9501(0) ack 4 win 257

Modified based on comments by Neal Cardwell <ncardwell@google.com>

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 16:20:35 -07:00
..
6lowpan
9p net/9p/client.c: put refcount of trans_mod in error case in parse_opts() 2018-07-14 11:11:09 -07:00
802
8021q net: fix use-after-free in GRO with ESP 2018-07-02 20:34:04 +09:00
appletalk Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
atm Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
ax25 Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
batman-adv batman-adv: Fix multicast TT issues with bogus ROAM flags 2018-06-23 10:29:33 +02:00
bluetooth Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
bpf bpf: fix panic due to oob in bpf_prog_test_run_skb 2018-07-11 16:10:57 -07:00
bpfilter bpfilter: include bpfilter_umh in assembly instead of using objcopy 2018-06-28 21:39:16 +09:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-06-16 07:39:34 +09:00
caif net: caif: Add a missing rcu_read_unlock() in caif_flow_cb 2018-07-21 16:14:39 -07:00
can Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
ceph The main piece is a set of libceph changes that revamps how OSD 2018-06-15 07:24:58 +09:00
core sock: fix sg page frag coalescing in sk_alloc_sg 2018-07-23 21:28:45 -07:00
dcb treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
dccp Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
decnet Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
dns_resolver KEYS: DNS: fix parsing multiple options 2018-07-16 11:22:14 -07:00
dsa net: dsa: add error handling for pskb_trim_rcsum 2018-06-11 14:19:38 -07:00
ethernet net: core: rework basic flow dissection helper 2018-05-08 00:02:36 -04:00
hsr
ieee802154 ieee802154: 6lowpan: set IFLA_LINK 2018-07-05 11:13:17 +02:00
ife net: sched: ife: check on metadata length 2018-04-22 21:12:00 -04:00
ipv4 tcp: ack immediately when a cwr packet arrives 2018-07-25 16:20:35 -07:00
ipv6 ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull 2018-07-24 16:35:58 -07:00
iucv Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
kcm Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
key Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
l2tp Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
l3mdev
lapb
llc Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
mac80211 nl80211/mac80211: allow non-linear skb in rx_control_port 2018-07-06 14:34:42 +02:00
mac802154 net/mac802154: disambiguate mac80215 vs mac802154 trace events 2018-03-28 22:55:18 +02:00
mpls net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ncsi net/ncsi: Use netdev_dbg for debug messages 2018-06-20 07:26:58 +09:00
netfilter netfilter: nf_tables: move dumper state allocation into ->start 2018-07-24 00:36:33 +02:00
netlabel audit: use inline function to get audit context 2018-05-14 17:24:18 -04:00
netlink Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
netrom Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
nfc net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL. 2018-07-18 10:51:45 -07:00
nsh nsh: set mac len based on inner packet 2018-07-12 16:55:29 -07:00
openvswitch treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
packet packet: reset network header if packet shorter than ll reserved space 2018-07-12 16:55:59 -07:00
phonet Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
psample
qrtr net: qrtr: Reset the node and port ID of broadcast messages 2018-07-05 20:20:03 +09:00
rds rds: clean up loopback rds_connections on netns deletion 2018-06-27 10:11:03 +09:00
rfkill rfkill: Create rfkill-none LED trigger 2018-05-23 11:26:45 +02:00
rose Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
rxrpc Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
sched net: sched: Using NULL instead of plain integer 2018-07-18 13:44:07 -07:00
sctp sctp: fix the issue that pathmtu may be set lower than MINSEGMENT 2018-07-04 21:36:34 +09:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-07-18 19:32:54 -07:00
strparser strparser: Remove early eaten to fix full tcp receive buffer stall 2018-06-28 21:37:26 +09:00
sunrpc NFS client bugfixes for Linux 4.18 2018-06-22 06:21:34 +09:00
switchdev
tipc tipc: make function tipc_net_finalize() thread safe 2018-07-07 19:49:02 +09:00
tls tls: check RCV_SHUTDOWN in tls_wait_data 2018-07-20 14:38:14 -07:00
unix Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
vmw_vsock Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
wimax
wireless cfg80211: never ignore user regulatory hint 2018-07-24 09:11:31 +02:00
x25 Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
xdp xsk: do not return EMSGSIZE in copy mode for packets larger than MTU 2018-07-13 15:34:31 +02:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
compat.c net: support compat 64-bit time in {s,g}etsockopt 2018-04-27 19:46:06 -04:00
Kconfig net: Introduce generic failover module 2018-05-28 22:59:54 -04:00
Makefile bpfilter: check compiler capability in Kconfig 2018-06-28 13:36:39 +09:00
socket.c net: handle NULL ->poll gracefully 2018-06-29 06:51:51 -07:00
sysctl_net.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00