kernel_optimize_test/net
Pablo Neira Ayuso c4832c7bbc netfilter: nf_ct_tcp: improve out-of-sync situation in TCP tracking
Without this patch, if we receive a SYN packet from the client while
the firewall is out-of-sync, we let it go through. Then, if we see
the SYN/ACK reply coming from the server, we destroy the conntrack
entry and drop the packet to trigger a new retransmission. Then,
the retransmision from the client is used to start a new clean
session.

This patch improves the current handling. Basically, if we see an
unexpected SYN packet, we annotate the TCP options. Then, if we
see the reply SYN/ACK, this means that the firewall was indeed
out-of-sync. Therefore, we set a clean new session from the existing
entry based on the annotated values.

This patch adds two new 8-bits fields that fit in a 16-bits gap of
the ip_ct_tcp structure.

This patch is particularly useful for conntrackd since the
asynchronous nature of the state-synchronization allows to have
backup nodes that are not perfect copies of the master. This helps
to improve the recovery under some worst-case scenarios.

I have tested this by creating lots of conntrack entries in wrong
state:

for ((i=1024;i<65535;i++)); do conntrack -I -p tcp -s 192.168.2.101 -d 192.168.2.2 --sport $i --dport 80 -t 800 --state ESTABLISHED -u ASSURED,SEEN_REPLY; done

Then, I make some TCP connections:

$ echo GET / | nc 192.168.2.2 80

The events show the result:

 [UPDATE] tcp      6 60 SYN_RECV src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
 [UPDATE] tcp      6 120 FIN_WAIT src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
 [UPDATE] tcp      6 30 LAST_ACK src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
 [UPDATE] tcp      6 120 TIME_WAIT src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]

and tcpdump shows no retransmissions:

20:47:57.271951 IP 192.168.2.101.33221 > 192.168.2.2.www: S 435402517:435402517(0) win 5840 <mss 1460,sackOK,timestamp 4294961827 0,nop,wscale 6>
20:47:57.273538 IP 192.168.2.2.www > 192.168.2.101.33221: S 3509927945:3509927945(0) ack 435402518 win 5792 <mss 1460,sackOK,timestamp 235681024 4294961827,nop,wscale 4>
20:47:57.273608 IP 192.168.2.101.33221 > 192.168.2.2.www: . ack 3509927946 win 92 <nop,nop,timestamp 4294961827 235681024>
20:47:57.273693 IP 192.168.2.101.33221 > 192.168.2.2.www: P 435402518:435402524(6) ack 3509927946 win 92 <nop,nop,timestamp 4294961827 235681024>
20:47:57.275492 IP 192.168.2.2.www > 192.168.2.101.33221: . ack 435402524 win 362 <nop,nop,timestamp 235681024 4294961827>
20:47:57.276492 IP 192.168.2.2.www > 192.168.2.101.33221: P 3509927946:3509928082(136) ack 435402524 win 362 <nop,nop,timestamp 235681025 4294961827>
20:47:57.276515 IP 192.168.2.101.33221 > 192.168.2.2.www: . ack 3509928082 win 108 <nop,nop,timestamp 4294961828 235681025>
20:47:57.276521 IP 192.168.2.2.www > 192.168.2.101.33221: F 3509928082:3509928082(0) ack 435402524 win 362 <nop,nop,timestamp 235681025 4294961827>
20:47:57.277369 IP 192.168.2.101.33221 > 192.168.2.2.www: F 435402524:435402524(0) ack 3509928083 win 108 <nop,nop,timestamp 4294961828 235681025>
20:47:57.279491 IP 192.168.2.2.www > 192.168.2.101.33221: . ack 435402525 win 362 <nop,nop,timestamp 235681025 4294961828>

I also added a rule to log invalid packets, with no occurrences  :-) .

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-23 10:37:34 +01:00
..
9p virtio: add virtio IDs file 2009-09-23 22:26:32 +09:30
802 net: remove COMPAT_NET_DEV_OPS 2009-05-25 01:53:53 -07:00
8021q vlan: Add support to netdev_ops.ndo_fcoe_get_wwn for VLAN device 2009-10-29 01:04:04 -07:00
appletalk net: mark net_proto_ops as const 2009-10-07 01:10:46 -07:00
atm net: Generalize socket rx gap / receive queue overflow cmsg 2009-10-12 13:26:31 -07:00
ax25 net: mark net_proto_ops as const 2009-10-07 01:10:46 -07:00
bluetooth Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-10-27 01:03:26 -07:00
bridge bridge: Optimize multiple unregistration 2009-10-29 01:13:48 -07:00
can net: Cleanup redundant tests on unsigned 2009-10-29 01:39:54 -07:00
core net: Introduce dev_get_by_index_rcu() 2009-10-29 01:42:55 -07:00
dcb net: fix double skb free in dcbnl 2009-09-26 20:16:15 -07:00
dccp net: Fix for dst_negative_advice 2009-10-20 18:55:46 -07:00
decnet net: Fix for dst_negative_advice 2009-10-20 18:55:46 -07:00
dsa netdev: convert pseudo-devices to netdev_tx_t 2009-09-01 01:13:07 -07:00
econet econet: Fix redeclaration of symbol len 2009-10-07 14:43:04 -07:00
ethernet net: remove COMPAT_NET_DEV_OPS 2009-05-25 01:53:53 -07:00
ieee802154 net: sk_drops consolidation 2009-10-14 20:40:11 -07:00
ipv4 netfilter: remove unneccessary checks from netlink notifiers 2009-11-06 17:04:00 +01:00
ipv6 netfilter: remove unneccessary checks from netlink notifiers 2009-11-06 17:04:00 +01:00
ipx net: mark net_proto_ops as const 2009-10-07 01:10:46 -07:00
irda net: mark net_proto_ops as const 2009-10-07 01:10:46 -07:00
iucv af_iucv: remove duplicate sock_set_flag 2009-10-17 23:57:20 -07:00
key net: Generalize socket rx gap / receive queue overflow cmsg 2009-10-12 13:26:31 -07:00
lapb net: remove NET_RX_BAD and NET_RX_CN* defines 2009-07-05 19:15:35 -07:00
llc net: mark net_proto_ops as const 2009-10-07 01:10:46 -07:00
mac80211 mesh: use set_bit() to set MESH_WORK_HOUSEKEEPING. 2009-10-27 16:48:35 -04:00
netfilter netfilter: nf_ct_tcp: improve out-of-sync situation in TCP tracking 2009-11-23 10:37:34 +01:00
netlabel Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-07-30 19:22:43 -07:00
netlink genetlink: Optimize and one bug fix in genl_generate_id() 2009-10-17 23:57:26 -07:00
netrom net: mark net_proto_ops as const 2009-10-07 01:10:46 -07:00
packet vlan: allow null VLAN ID to be used 2009-10-27 01:02:33 -07:00
phonet Phonet: hold socket before giving it to sk_deliver_skb() 2009-10-15 12:30:42 -07:00
rds inet: rename some inet_sock fields 2009-10-18 18:52:53 -07:00
rfkill headers: remove sched.h from poll.h 2009-10-04 15:05:10 -07:00
rose net: mark net_proto_ops as const 2009-10-07 01:10:46 -07:00
rxrpc net: Generalize socket rx gap / receive queue overflow cmsg 2009-10-12 13:26:31 -07:00
sched pkt_sched: skbedit add support for setting mark 2009-10-22 21:56:42 -07:00
sctp inet: rename some inet_sock fields 2009-10-18 18:52:53 -07:00
sunrpc inet: rename some inet_sock fields 2009-10-18 18:52:53 -07:00
tipc net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
unix Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-10-27 01:03:26 -07:00
wanrouter headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
wimax wimax: fix warning caused by not checking retval of rfkill_set_hw_state() 2009-06-11 11:12:48 -07:00
wireless cfg80211: remove warning in deauth case 2009-10-27 16:48:17 -04:00
x25 net: Cleanup redundant tests on unsigned 2009-10-29 01:39:53 -07:00
xfrm xfrm: remove skb_icv_walk 2009-10-18 21:32:01 -07:00
compat.c net: Cleanup redundant tests on unsigned 2009-10-29 01:39:54 -07:00
Kconfig net/compat/wext: send different messages to compat tasks 2009-07-15 08:53:39 -07:00
Makefile net: remove redundant sched/ in net/Makefile 2009-07-12 20:11:14 -07:00
nonet.c
socket.c net: Introduce recvmmsg socket syscall 2009-10-12 23:40:10 -07:00
sysctl_net.c
TUNABLE