Commit Graph

1213 Commits

Author SHA1 Message Date
Patrick McHardy
a3c941b08d [NETFILTER]: Kconfig: improve dependency handling
Instead of depending on internally needed options and letting users
figure out what is needed, select them when needed:

- IP_NF_IPTABLES, IP_NF_ARPTABLES and IP6_NF_IPTABLES select
  NETFILTER_XTABLES

- NETFILTER_XT_TARGET_CONNMARK, NETFILTER_XT_MATCH_CONNMARK and
  IP_NF_TARGET_CLUSTERIP select NF_CONNTRACK_MARK

- NETFILTER_XT_MATCH_CONNBYTES selects NF_CT_ACCT

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:15:02 -08:00
Patrick McHardy
982d9a9ce3 [NETFILTER]: nf_conntrack: properly use RCU for nf_conntrack_destroyed callback
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:14:11 -08:00
Patrick McHardy
6b48a7d08d [NETFILTER]: ip_conntrack: properly use RCU for ip_conntrack_destroyed callback
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:13:58 -08:00
Patrick McHardy
abbaccda4c [NETFILTER]: ip_conntrack: fix invalid conntrack statistics RCU assumption
CONNTRACK_STAT_INC assumes rcu_read_lock in nf_hook_slow disables
preemption as well, making it legal to use __get_cpu_var without
disabling preemption manually. The assumption is not correct anymore
with preemptable RCU, additionally we need to protect against softirqs
when not holding ip_conntrack_lock.

Add CONNTRACK_STAT_INC_ATOMIC macro, which disables local softirqs,
and use where necessary.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:13:14 -08:00
Patrick McHardy
923f4902fe [NETFILTER]: nf_conntrack: properly use RCU API for nf_ct_protos/nf_ct_l3protos arrays
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
all paths not obviously only used within packet process context
(nfnetlink_conntrack).
  
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:12:57 -08:00
Patrick McHardy
642d628b2c [NETFILTER]: ip_conntrack: properly use RCU API for ip_ct_protos array
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
all paths not obviously only used within packet process context
(nfnetlink_conntrack).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:12:40 -08:00
Patrick McHardy
e22a054869 [NETFILTER]: nf_nat: properly use RCU API for nf_nat_protos array
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
paths used outside of packet processing context (nfnetlink_conntrack).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:12:26 -08:00
Patrick McHardy
a441dfdbb2 [NETFILTER]: ip_nat: properly use RCU API for ip_nat_protos array
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
paths used outside of packet processing context (nfnetlink_conntrack).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:12:09 -08:00
Patrick McHardy
e92ad99c78 [NETFILTER]: nf_log: minor cleanups
- rename nf_logging to nf_loggers since its an array of registered loggers

- rename nf_log_unregister_logger() to nf_log_unregister() to make it
  symetrical to nf_log_register() and convert all users

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:11:55 -08:00
Patrick McHardy
c3a47ab3e5 [NETFILTER]: Properly use RCU in nf_ct_attach
Use rcu_assign_pointer/rcu_dereference for ip_ct_attach pointer instead
of self-made RCU and use rcu_read_lock to make sure the conntrack module
doesn't disappear below us while calling it, since this function can be
called from outside the netfilter hooks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:09:19 -08:00
Arjan van de Ven
9a32144e9d [PATCH] mark struct file_operations const 7
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:46 -08:00
Linus Torvalds
cb18eccff4 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits)
  [IPV4]: Restore multipath routing after rt_next changes.
  [XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel patch.
  [NET]: Reorder fields of struct dst_entry
  [DECNET]: Convert decnet route to use the new dst_entry 'next' pointer
  [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer
  [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer
  [NET]: Introduce union in struct dst_entry to hold 'next' pointer
  [DECNET]: fix misannotation of linkinfo_dn
  [DECNET]: FRA_{DST,SRC} are le16 for decnet
  [UDP]: UDP can use sk_hash to speedup lookups
  [NET]: Fix whitespace errors.
  [NET] XFRM: Fix whitespace errors.
  [NET] X25: Fix whitespace errors.
  [NET] WANROUTER: Fix whitespace errors.
  [NET] UNIX: Fix whitespace errors.
  [NET] TIPC: Fix whitespace errors.
  [NET] SUNRPC: Fix whitespace errors.
  [NET] SCTP: Fix whitespace errors.
  [NET] SCHED: Fix whitespace errors.
  [NET] RXRPC: Fix whitespace errors.
  ...
2007-02-11 11:38:13 -08:00
Robert P. J. Day
c376222960 [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:27 -08:00
Eric Dumazet
5ef213f684 [IPV4]: Restore multipath routing after rt_next changes.
I forgot to test build this part of the networking code... Sorry guys.
This patch renames u.rt_next to u.dst.rt_next

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:50 -08:00
Eric Dumazet
093c2ca416 [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer
This patch removes the rt_next pointer from 'struct rtable.u' union,
and renames u.rt_next to u.dst_rt_next.

It also moves 'struct flowi' right after 'struct dst_entry' to prepare
the gain on lookups.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:38 -08:00
Eric Dumazet
95f30b336b [UDP]: UDP can use sk_hash to speedup lookups
In a prior patch, I introduced a sk_hash field (__sk_common.skc_hash)  to let
tcp lookups use one cache line per unmatched entry instead of two.

We can also use sk_hash to speedup UDP part as well. We store in sk_hash the
hnum value, and use sk->sk_hash (same cache line than 'next' pointer),
instead of inet->num (different cache line)

Note : We still have a false sharing problem for SMP machines, because
sock_hold(sock) dirties the cache line containing the 'next' pointer. Not
counting the udp_hash_lock rwlock. (did someone mentioned RCU ? :) )

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:29 -08:00
YOSHIFUJI Hideaki
e905a9edab [NET] IPV4: Fix whitespace errors.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:19:39 -08:00
Eric Dumazet
dbca9b2750 [NET]: change layout of ehash table
ehash table layout is currently this one :

First half of this table is used by sockets not in TIME_WAIT state
Second half of it is used by sockets in TIME_WAIT state.

This is non optimal because of for a given hash or socket, the two chain heads 
are located in separate cache lines.
Moreover the locks of the second half are never used.

If instead of this halving, we use two list heads in inet_ehash_bucket instead 
of only one, we probably can avoid one cache miss, and reduce ram usage, 
particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, 
CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep 
together related chains to speedup lookups and socket state change.

In this patch I did not try to align struct inet_ehash_bucket, but a future 
patch could try to make this structure have a convenient size (a power of two 
or a multiple of L1_CACHE_SIZE).
I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so 
maybe we dont need to scratch our heads to align the bucket...

Note : In case struct inet_ehash_bucket is not a power of two, we could 
probably change alloc_large_system_hash() (in case it use __get_free_pages()) 
to free the unused space. It currently allocates a big zone, but the last 
quarter of it could be freed. Again, this should be a temporary 'problem'.

Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 14:16:46 -08:00
Jan Engelhardt
e60a13e030 [NETFILTER]: {ip,ip6}_tables: use struct xt_table instead of redefined structure names
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:20 -08:00
Jan Engelhardt
6709dbbb19 [NETFILTER]: {ip,ip6}_tables: remove x_tables wrapper functions
Use the x_tables functions directly to make it better visible which
parts are shared between ip_tables and ip6_tables.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:19 -08:00
Jan Engelhardt
e1fd0586b0 [NETFILTER]: x_tables: fix return values for LOG/ULOG
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:18 -08:00
Eric Leblond
41f4689a7c [NETFILTER]: NAT: optional source port randomization support
This patch adds support to NAT to randomize source ports.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:17 -08:00
Patrick McHardy
cdd289a2f8 [NETFILTER]: add IPv6-capable TCPMSS target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:16 -08:00
Patrick McHardy
a8d0f9526f [NET]: Add UDPLITE support in a few missing spots
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:14 -08:00
Patrick McHardy
efbc597634 [NETFILTER]: nf_nat: remove broken HOOKNAME macro
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:12 -08:00
Patrick McHardy
a09113c2c8 [NETFILTER]: tcp conntrack: do liberal tracking for picked up connections
Do liberal tracking (only RSTs need to be in-window) for connections picked
up without seeing a SYN to deal with window scaling. Also change logging
of invalid packets not to log packets accepted by liberal tracking to avoid
spamming the logs.

Based on suggestion from James Ralston <ralston@pobox.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:10 -08:00
Stephen Hemminger
22f8cde5bc [NET]: unregister_netdevice as void
There was no real useful information from the unregister_netdevice() return
code, the only error occurred in a situation that was a driver bug. So
change it to a void function.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:06 -08:00
Alexey Dobriyan
cc63f70b8b [IPV4/IPV6] multicast: Check add_grhead() return value
add_grhead() allocates memory with GFP_ATOMIC and in at least two places skb
from it passed to skb_put() without checking.

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:04 -08:00
David S. Miller
f2f2102d1a [XFRM]: Fix missed error setting in xfrm4_policy.c
When we can't find the afinfo we should return EAFNOSUPPORT.
GCC warned about the uninitialized 'err' for this path as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:03 -08:00
Miika Komu
4337226228 [IPSEC]: IPv4 over IPv6 IPsec tunnel
This is the patch to support IPv4 over IPv6 IPsec.

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:02 -08:00
Miika Komu
c82f963efe [IPSEC]: IPv6 over IPv4 IPsec tunnel
This is the patch to support IPv6 over IPv4 IPsec

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:01 -08:00
Miika Komu
cdca72652a [IPSEC]: exporting xfrm_state_afinfo
This patch exports xfrm_state_afinfo.

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:00 -08:00
John Heffner
104439a887 [TCP]: Don't apply FIN exception to full TSO segments.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:51 -08:00
Baruch Even
8a3c3a9727 [TCP]: Check num sacks in SACK fast path
We clear the unused parts of the SACK cache, This prevents us from mistakenly
taking the cache data if the old data in the SACK cache is the same as the data
in the SACK block. This assumes that we never receive an empty SACK block with
start and end both at zero.

Signed-off-by: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:50 -08:00
Baruch Even
6f74651ae6 [TCP]: Seperate DSACK from SACK fast path
Move DSACK code outside the SACK fast-path checking code. If the DSACK
determined that the information was too old we stayed with a partial cache
copied. Most likely this matters very little since the next packet will not be
DSACK and we will find it in the cache. but it's still not good form and there
is little reason to couple the two checks.

Since the SACK receive cache doesn't need the data to be in host order we also
remove the ntohl in the checking loop.

Signed-off-by: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:49 -08:00
Baruch Even
fda03fbb56 [TCP]: Advance fast path pointer for first block only
Only advance the SACK fast-path pointer for the first block, the
fast-path assumes that only the first block advances next time so we
should not move the cached skb for the next sack blocks.

Signed-off-by: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:48 -08:00
David S. Miller
8eb9086f21 [IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts.
Do this even for non-blocking sockets.  This avoids the silly -EAGAIN
that applications can see now, even for non-blocking sockets in some
cases (f.e. connect()).

With help from Venkat Tekkirala.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:45 -08:00
Frederik Deweerdt
ba7808eac1 [TCP]: remove tcp header from tcp_v4_check (take #2)
The tcphdr struct passed to tcp_v4_check is not used, the following
patch removes it from the parameter list.

This adds the netfilter modifications missing in the patch I sent
for rc3-mm1.

Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:44 -08:00
Patrick McHardy
26932566a4 [NETLINK]: Don't BUG on undersized allocations
Currently netlink users BUG when the allocated skb for an event
notification is undersized. While this is certainly a kernel bug,
its not critical and crashing the kernel is too drastic, especially
when considering that these errors have appeared multiple times in
the past and it BUGs even if no listeners are present.

This patch replaces BUG by WARN_ON and changes the notification
functions to inform potential listeners of undersized allocations
using a unique error code (EMSGSIZE).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:41 -08:00
Patrick McHardy
40e0cb004a [NETFILTER]: ctnetlink: fix compile failure with NF_CONNTRACK_MARK=n
CC      net/netfilter/nf_conntrack_netlink.o
net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_conntrack_event':
net/netfilter/nf_conntrack_netlink.c:392: error: 'struct nf_conn' has no member named 'mark'
make[3]: *** [net/netfilter/nf_conntrack_netlink.o] Error 1

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-02 19:33:11 -08:00
Patrick McHardy
adcb471110 [NETFILTER]: SIP conntrack: fix out of bounds memory access
When checking for an @-sign in skp_epaddr_len, make sure not to
run over the packet boundaries.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-30 14:25:24 -08:00
Lars Immisch
7da5bfbb12 [NETFILTER]: SIP conntrack: fix skipping over user info in SIP headers
When trying to skip over the username in the Contact header, stop at the
end of the line if no @ is found to avoid mangling following headers.
We don't need to worry about continuation lines because we search inside
a SIP URI.

Fixes Netfilter Bugzilla #532.

Signed-off-by: Lars Immisch <lars@ibp.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-30 14:24:57 -08:00
Robert Olsson
095b8501e4 [IPV4]: Fix single-entry /proc/net/fib_trie output.
When main table is just a single leaf this gets printed as belonging to the 
local table in /proc/net/fib_trie. A fix is below.

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-26 19:06:01 -08:00
Patrick McHardy
a46bf7d5a8 [NETFILTER]: nf_nat_pptp: fix expectation removal
When removing the expectation for the opposite direction, the PPTP NAT
helper initializes the tuple for lookup with the addresses of the
opposite direction, which makes the lookup fail.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-26 01:07:30 -08:00
Patrick McHardy
c72c6b2a29 [NETFILTER]: nf_nat: fix ICMP translation with statically linked conntrack
When nf_nat/nf_conntrack_ipv4 are linked statically, nf_nat is initialized
before nf_conntrack_ipv4, which makes the nf_ct_l3proto_find_get(AF_INET)
call during nf_nat initialization return the generic l3proto instead of
the AF_INET specific one. This breaks ICMP error translation since the
generic protocol always initializes the IPs in the tuple to 0.

Change the linking order and put nf_conntrack_ipv4 first.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-26 01:06:47 -08:00
David S. Miller
e89862f4c5 [TCP]: Restore SKB socket owner setting in tcp_transmit_skb().
Revert 931731123a

We can't elide the skb_set_owner_w() here because things like certain
netfilter targets (such as owner MATCH) need a socket to be set on the
SKB for correct operation.

Thanks to Jan Engelhardt and other netfilter list members for
pointing this out.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-26 01:04:55 -08:00
Baruch Even
db3ccdac26 [TCP]: Fix sorting of SACK blocks.
The sorting of SACK blocks actually munges them rather than sort,
causing the TCP stack to ignore some SACK information and breaking the
assumption of ordered SACK blocks after sorting.

The sort takes the data from a second buffer which isn't moved causing
subsequent data moves to occur from the wrong location. The fix is to
use a temporary buffer as a normal sort does.

Signed-off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-25 13:35:06 -08:00
Eric W. Biederman
6640e69731 [IPV4]: Fix the fib trie iterator to work with a single entry routing tables
In a kernel with trie routing enabled I had a simple routing setup
with only a single route to the outside world and no default
route. "ip route table list main" showed my the route just fine but
/proc/net/route was an empty file.  What was going on?

Thinking it was a bug in something I did and I looked deeper.  Eventually
I setup a second route and everything looked correct, huh?  Finally I
realized that the it was just the iterator pair in fib_trie_get_first,
fib_trie_get_next just could not handle a routing table with a single entry.

So to save myself and others further confusion, here is a simple fix for
the fib proc iterator so it works even when there is only a single route
in a routing table.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-24 14:42:04 -08:00
Jarek Poplawski
52d570aabe [TCP]: rare bad TCP checksum with 2.6.19
The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE"
changed to unconditional copying of ip_summed field from collapsed
skb. This patch reverts this change.

The majority of substantial work including heavy testing
and diagnosing by: Michael Tokarev <mjt@tls.msk.ru>
Possible reasons pointed by: Herbert Xu and Patrick McHardy.

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-23 22:07:12 -08:00
Masayuki Nakagawa
fb7e2399ec [TCP]: skb is unexpectedly freed.
I encountered a kernel panic with my test program, which is a very
simple IPv6 client-server program.

The server side sets IPV6_RECVPKTINFO on a listening socket, and the
client side just sends a message to the server.  Then the kernel panic
occurs on the server.  (If you need the test program, please let me
know. I can provide it.)

This problem happens because a skb is forcibly freed in
tcp_rcv_state_process().

When a socket in listening state(TCP_LISTEN) receives a syn packet,
then tcp_v6_conn_request() will be called from
tcp_rcv_state_process().  If the tcp_v6_conn_request() successfully
returns, the skb would be discarded by __kfree_skb().

However, in case of a listening socket which was already set
IPV6_RECVPKTINFO, an address of the skb will be stored in
treq->pktopts and a ref count of the skb will be incremented in
tcp_v6_conn_request().  But, even if the skb is still in use, the skb
will be freed.  Then someone still using the freed skb will cause the
kernel panic.

I suggest to use kfree_skb() instead of __kfree_skb().

Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-23 20:25:52 -08:00