tmp_suning_uos_patched

Author	SHA1	Message	Date
Ian McDonald	832e3ca62d	[DCCP] ccid3: return value in ccid3_hc_rx_calc_first_li In a recent patch we introduced invalid return codes which will result in the opposite of what is intended (i.e. send more packets in face of peculiar network conditions). This fixes it by returning ~0 which means not calculated as per dccp_li_hist_calc_i_mean. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-13 16:48:24 -08:00
Arnaldo Carvalho de Melo	8109b02b53	[DCCP]: Whitespace cleanups That accumulated over the last months hackaton, shame on me for not using git-apply whitespace helping hand, will do that from now on. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:35:00 -08:00
Arnaldo Carvalho de Melo	1fba78b6cb	[DCCP] ccid3: Fixup some type conversions related to rtts Spotted by David Miller when compiling on sparc64, I reproduced it here on parisc64, that are the only platforms to define __kernel_suseconds_t as an 'int', all the others, x86_64 and x86 included typedef it as a 'long', but from the definition of suseconds_t it should just be an 'int' on platforms where it is >= 32bits, it would not require all the castings from suseconds_t to (int) when printking variables of this type, that are not needed on parisc64 and sparc64. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:59 -08:00
Gerrit Renker	9e8efc8240	[DCCP] ccid3: BUG-FIX - conversion errors This fixes conversion errors which arose by not properly type-casting from u32 to __u64. Fixed by explicitly casting each type which is not __u64, or by performing operation after assignment. The patch further adds missing debug information to track the current value of X_recv. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:58 -08:00
Gerrit Renker	7af5af3013	[DCCP] ccid3: Reorder packet history source file No code change at all. This reorders the source file to follow the same order as the corresponding header file. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:57 -08:00
Gerrit Renker	85dcb1f780	[DCCP] ccid3: Reorder packet history header file No code change at all. To make the header file easier to read, the following ordering is established among the declarations: * hist_new * hist_delete * hist_entry_new * hist_head * hist_find_entry * hist_add_entry * hist_entry_delete * hist_purge Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:56 -08:00
Gerrit Renker	a967241129	[DCCP] ccid3: Make debug output consistent This patch does not alter any algorithm, just the debug message format: * s#%s, sk=%p#%s(%p)#g * when a statename is present, it now uses %s(%p, state=%s) * when only function entry is debugged, it adds an `- entry' Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:55 -08:00
Gerrit Renker	c5a1ae9a4c	[DCCP] ccid3: Perform history operations only after packet has been sent This migrates all packet history operations into the routine ccid3_hc_tx_packet_sent, thereby removing synchronization problems that occur when, as before, the operations are spread over multiple routines. The following minor simplifications are also applied: * several simplifications now follow from this change - several tests are now no longer required * removal of one unnecessary variable (dp) Justification: Currently packet history operations span two different routines, one of which is likely to pass through several iterations of sleeping and awakening. The first routine, ccid3_hc_tx_send_packet, allocates an entry and sets a few fields. The remaining fields are filled in when the second routine (which is not within a sleeping context), ccid3_hc_tx_packet_sent, is called. This has several strong drawbacks: * it is not necessary to split history operations - all fields can be filled in by the second routine * the first routine is called multiple times, until a packet can be sent, and sleeps meanwhile - this causes a lot of difficulties with regard to keeping the list consistent * since both routines do not have a producer-consumer like synchronization, it is very difficult to maintain data across calls to these routines * the fact that the routines are called in different contexts (sleeping, not sleeping) adds further problems Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:54 -08:00
Gerrit Renker	e312d100f1	[DCCP] ccid3: TX history - remove unused field This removes the `dccphtx_ccval' field since it is nowhere used in the code and in fact not necessary for the accounting. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:53 -08:00
Gerrit Renker	9f8681db96	[DCCP] ccid3: Shift window counter computation This puts the window counter computation [RFC 4342, 8.1] into a separate function which is called whenever a new packet is ready for immediate transmission in ccid3_hc_tx_send_packet. Justification: The window counter update was previously computed after the packet was sent. This has two drawbacks, both fixed by this patch: 1) re-compute another timestamp almost directly after the packet was sent (expensive), 2) the CCVal for the window counter is needed at the instant the packet is sent. Further details: The initialisation of the window counter is left in the state NO_SENT, as before. The algorithm will do nothing if either RTT is initialised to 0 (which is ok) or if the RTT value remains below 4 microseconds (which is almost pathological). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:52 -08:00
Gerrit Renker	de553c189e	[DCCP] ccid3: Sanity-check RTT samples CCID3 performance depends much on the accuracy of RTT samples. If RTT samples grow too large, performance can be catastrophically poor. To limit the amount of possible damage in such cases, the patch * introduces an upper limit which identifies a maximum `sane' RTT value; * uses a macro to enforce this upper limit. Using a macro was given preference, since it is necessary to identify the calling function in the warning message. Since exceeding this threshold identifies a critical condition, DCCP_CRIT is used and not DCCP_WARN. Many thanks to Ian McDonald for collaboration on this issue. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:51 -08:00
Gerrit Renker	fe0499ae95	[DCCP] ccid3: Initialise RTT values In both the sender and the receiver it is possible that the stored RTT value is accessed before an actual RTT estimate has been computed. This patch * initialises the sender RTT to 0 - the sender always accesses the RTT in ccid3_hc_tx_packet_sent - the RTT is further needed for the window counter algorithm * replaces the receiver initialisation of 5msec with 0 - which has the same effect and removes an `XXX' - the RTT value is needed in ccid3_hc_rx_packet_recv as rtt_prev Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:50 -08:00
Gerrit Renker	65d6c2b42e	[DCCP] ccid: Deprecate ccid_hc_tx_insert_options The function ccid3_hc_tx_insert_options only does a redundant no-op, as the operation DCCP_SKB_CB(skb)->dccpd_ccval = hctx->ccid3hctx_last_win_count; is already performed _unconditionally_ in ccid3_hc_tx_send_packet. Since there is further no current need for this function, it is removed entirely. Since furthermore, there is actually no present need for the entire interface function ccid_hc_tx_insert_options, it was decided to remove it also, to clean up the interface. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:49 -08:00
Gerrit Renker	f6282f4da5	[DCCP]: Warn when discarding packet due to internal errors This adds a (debug) warning message which is triggered whenever a packet is discarded due to send failure. It also adds a conditional, so that an interruption during dccp_wait_for_ccid is not treated as a `BUG': the rationale is that interruptions are external, whereas bug warnings are concerned with the internals. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:48 -08:00
Gerrit Renker	bf58a381e8	[DCCP]: Only deliver to the CCID rx side in charge This is an optimisation to reduce CPU load. The received feedback is now only directed to the active CCID component, without requiring processing also by the inactive one. As a consequence, a similar test in ccid3.c is now redundant and is also removed. Justification: Currently DCCP works as a unidirectional service, i.e. a listening server is not at the same time a connecting client. As far as I can see, several modifications are necessary until that becomes possible. At the present time, received feedback is both fed to the rx/tx CCID modules. In unidirectional service, only one of these is active at any one time. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:47 -08:00
Gerrit Renker	d63d8364cf	[DCCP]: Simplify TFRC calculation In migrating towards using the newer functions scaled_div/scaled_div32 for TFRC computations mapped from floating-point onto integer arithmetic, this completes the last stage of modifications. In particular, the overflow case for computing X_calc is circumvented by * breaking the computation into two stages * the first stage, res = (s1E6)/R, cannot overflow due to use of u64 in the second stage, res = (res*1E6)/f, overflow on u32 is avoided due to (i) returning UINT_MAX in this case (which is logically appropriate) and (ii) issuing a warning message into the system log (since very likely there is a problem somewhere else with the parameters) Lastly, all such scaling operations are now exported into tfrc.h, since actually this form of scaled computation is specific to TFRC and not to CCID3. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:46 -08:00
Gerrit Renker	0f9e5b573f	[DCCP]: Debug timeval operations Problem: Most target types in the CCID3 code are u32, so subtle conversion errors can occur if signed time calculations yield negative results: the original values are lost in the conversion to unsigned, calculation errors go undetected. This patch therefore * sets all critical time types from unsigned to suseconds_t * avoids comparison between signed/unsigned via type-casting * provides ample warning messages in case time calculations are negative These warning messages can be removed at a later stage when the code has undergone more testing. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:45 -08:00
Gerrit Renker	bfe24a6cc2	[DCCP] ccid3: Simplify calculation for reverse lookup of p This simplifies the calculation of a value p for a given fval when the first loss interval is computed (RFC 3448, 6.3.1). It makes use of the two new functions scaled_div/scaled_div32 to provide overflow protection. Additionally, protection against divide-by-zero is extended - in this case the function will return the maximally possible value of p=100%. Background: The maximum fval, f(100%), is approximately 244, i.e. the scaled value of fval should never exceed 244E6, which fits easily into u32. The problem is the scaling by 10^6, since additionally R(TT) is in microseconds. This is resolved by breaking the division into two stages: the first stage computes fval=(s10^6)/R, stores that into u64; the second stage computes fval = (fval10^6)/X_recv and complains if overflow is reached for u32. This case is safe since the TFRC reverse-lookup routine then returns p=100%. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:44 -08:00
Gerrit Renker	b9039a2a8d	[DCCP] ccid3: Replace scaled division operations This replaces the remaining uses of usecs_div with scaled_div32, which internally uses 64bit division and produces a warning on overflow. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:43 -08:00
Gerrit Renker	1a21e49a8d	[DCCP] ccid3: Finer-grained resolution of sending rates This patch * resolves a bug where packets smaller than 32/64 bytes resulted in sending rates of 0 * supports all sending rates from 1/64 bytes/second up to 4Gbyte/second * simplifies the present overflow problems in calculations Current sending rate X and the cached value X_recv of the receiver-estimated sending rate are both scaled by 64 (2^6) in order to * cope with low sending rates (minimally 1 byte/second) * allow upgrading to use a packets-per-second implementation of CCID 3 * avoid calculation errors due to integer arithmetic cut-off The patch implements a revised strategy from http://www.mail-archive.com/dccp@vger.kernel.org/msg01040.html The only difference with regard to that strategy is that t_ipi is already used in the calculation of the nofeedback timeout, which saves one division. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:42 -08:00
Gerrit Renker	179ebc9f92	[DCCP] ccid3: Fix two bugs in sending rate computation This fixes 1) a bug in the recomputation of the sending rate by the nofeedback timer when no feedback at all has so far been sent by the receiver: min_t was used instead of max_t, which is wrong (cf. RFC 3448, p. 10); 2) an error in the computation of larger initial windows: instead of min(... max()) (cf. RFC 4342, 5.), the code had used max(... max()). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:41 -08:00
Gerrit Renker	ff58629824	[DCCP] ccid3: Two optimisations for sending rate recomputation This performs two optimisations for the recomputation of the sending rate. 1) Currently the target sending rate X_calc is recalculated whenever a) the nofeedback timer expires, or b) a feedback packet is received. In the (a) case, recomputing X_calc is redundant, since * the parameters p and RTT do not change in between the reception of feedback packets; * the parameter X_recv is either modified from received feedback or via the nofeedback timer; * a test (`p == 0') in the nofeedback timer avoids using a stale/undefined value of X_calc if p was previously 0. 2) The nofeedback timer now only recomputes a timestamp when p == 0. This is according to step (4) of [RFC 3448, 4.3] and avoids unnecessarily determining a timestamp. A debug statement about not updating X is also removed - it helps very little in debugging and just clutters the logs. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:40 -08:00
Gerrit Renker	45393a66a2	[DCCP] ccid3: Check against too large p This patch follows a suggestion by Ian McDonald and ensures that in the current code the value of p can not exceed 100%. Such a value is illegal and would consequently cause a bug condition in tfrc_calc_x(). The receiver case is also tested, and a warning message is added. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:39 -08:00
Ian McDonald	5cc3741d6c	[DCCP]: Remove timeo from output.c It simplifies waiting for the CCID module to signal that a packet is ready to be sent. Other simplifications flow on from this such as removing constants. As a result of this EAGAIN is not returned any more by dccp_wait_for_ccid (which would otherwise lead to unnecessarily discarding the packet in dccp_write_xmit). Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:37 -08:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Christoph Lameter	54e6ecb239	[PATCH] slab: remove SLAB_ATOMIC SLAB_ATOMIC is an alias of GFP_ATOMIC Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
David Howells	9db7372445	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/ata/libata-scsi.c include/linux/libata.h Futher merge of Linus's head and compilation fixups. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 17:01:28 +00:00
David Howells	4c1ac1b491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 14:37:56 +00:00
Gerrit Renker	2bbf29acd8	[DCCP] tfrc: Binary search for reverse TFRC lookup This replaces the linear search algorithm for reverse lookup with binary search. It has the advantage of better scalability: O(log2(N)) instead of O(N). This means that the average number of iterations is reduced from 250 (linear search if each value appears equally likely) down to at most 9. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:53:27 -02:00
Gerrit Renker	44158306d7	[DCCP] ccid3: Deprecate TFRC_SMALLEST_P This patch deprecates the existing use of an arbitrary value TFRC_SMALLEST_P for low-threshold values of p. This avoids masking low-resolution errors. Instead, the code now checks against real boundaries (implemented by preceding patch) and provides warnings whenever a real value falls below the threshold. If such messages are observed, it is a better solution to take this as an indication that the lookup table needs to be re-engineered. Changelog: ---------- This patch * makes handling all TFRC resolution errors local to the TFRC library * removes unnecessary test whether X_calc is 'infinity' due to p==0 -- this condition is already caught by tfrc_calc_x() * removes setting ccid3hctx_p = TFRC_SMALLEST_P in ccid3_hc_tx_packet_recv since this is now done by the TFRC library * updates BUG_ON test in ccid3_hc_tx_no_feedback_timer to take into account that p now is either 0 (and then X_calc is irrelevant), or it is > 0; since the handling of TFRC_SMALLEST_P is now taken care of in the tfrc library Justification: -------------- The TFRC code uses a lookup table which has a bounded resolution. The lowest possible value of the loss event rate `p' which can be resolved is currently 0.0001. Substituting this lower threshold for p when p is less than 0.0001 results in a huge, exponentially-growing error. The error can be computed by the following formula: (f(0.0001) - f(p))/f(p) * 100 for p < 0.0001 Currently the solution is to use an (arbitrary) value TFRC_SMALLEST_P = 40 * 1E-6 = 0.00004 and to consider all values below this value as `virtually zero'. Due to the exponentially growing resolution error, this is not a good idea, since it hides the fact that the table can not resolve practically occurring cases. Already at p == TFRC_SMALLEST_P, the error is as high as 58.19%! Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:53:07 -02:00
Gerrit Renker	006042d7e1	[DCCP] tfrc: Identify TFRC table limits and simplify code This * adds documentation about the lowest resolution that is possible within the bounds of the current lookup table * defines a constant TFRC_SMALLEST_P which defines this resolution * issues a warning if a given value of p is below resolution * combines two previously adjacent if-blocks of nearly identical structure into one This patch does not change the algorithm as such. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:52:41 -02:00
Gerrit Renker	8d0086adac	[DCCP] tfrc: Add protection against invalid parameters to TFRC routines 1) For the forward X_calc lookup, it * protects effectively against RTT=0 (this case is possible), by returning the maximal lookup value instead of just setting it to 1 * reformulates the array-bounds exceeded condition: this only happens if p is greater than 1E6 (due to the scaling) * the case of negative indices can now with certainty be excluded, since documentation shows that the formulas are within bounds * additional protection against p = 0 (would give divide-by-zero) 2) For the reverse lookup, it warns against * protects against exceeding array bounds * now returns 0 if f(p) = 0, due to function definition * warns about minimal resolution error and returns the smallest table value instead of p=0 [this would mask congestion conditions] Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:52:26 -02:00
Gerrit Renker	90fb0e60dd	[DCCP] tfrc: Fix small error in reverse lookup of p for given f(p) This fixes the following small error in tfrc_calc_x_reverse_lookup. 1) The table is generated by the following equations: lookup[index][0] = g((index+1) * 1000000/TFRC_CALC_X_ARRSIZE); lookup[index][1] = g((index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE); where g(q) is 1E6 * f(q/1E6) 2) The reverse lookup assigns an entry in lookup[index][small] 3) This index needs to match the above, i.e. * if small=0 then p = (index + 1) * 1000000/TFRC_CALC_X_ARRSIZE * if small=1 then p = (index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE These are exactly the changes that the patch makes; previously the code did not conform to the way the lookup table was generated (this difference resulted in a mean error of about 1.12%). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:52:01 -02:00
Gerrit Renker	50ab46c790	[DCCP] tfrc: Document boundaries and limits of the TFRC lookup table This adds documentation for the TCP Reno throughput equation which is at the heart of the TFRC sending rate / loss rate calculations. It spells out precisely how the values were determined and what they mean. The equations were derived through reverse engineering and found to be fully accurate (verified using test programs). This patch does not change any code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:51:29 -02:00
Gerrit Renker	26af3072b0	[DCCP] ccid3: Fix warning message about illegal ACK This avoids a (harmless) warning message being printed at the DCCP server (the receiver of a DCCP half connection). Incoming packets are both directed to * ccid_hc_rx_packet_recv() for the server half * ccid_hc_tx_packet_recv() for the client half The message gets printed since on a server the client half is currently not sending data packets. This is resolved for the moment by checking the DCCP-role first. In future times (bidirectional DCCP connections), this test may have to be more sophisticated. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:51:14 -02:00
Gerrit Renker	5c3fbb6acf	[DCCP] ccid3: Fix bug in calculation of send rate The main object of this patch is the following bug: ==> In ccid3_hc_tx_packet_recv, the parameters p and X_recv were updated _after_ the send rate was calculated. This is clearly an error and is resolved by re-ordering statements. In addition, * r_sample is converted from u32 to long to check whether the time difference was negative (it would otherwise be converted to a large u32 value) * protection against RTT=0 (this is possible) is provided in a further patch * t_elapsed is also converted to long, to match the type of r_sample * adds a a more debugging information regarding current send rates * various trivial comment/documentation updates Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:50:56 -02:00
Gerrit Renker	76d127779e	[DCCP]: Fix BUG in retransmission delay calculation This bug resulted in ccid3_hc_tx_send_packet returning negative delay values, which in turn triggered silently dequeueing packets in dccp_write_xmit. As a result, only a few out of the submitted packets made it at all onto the network. Occasionally, when dccp_wait_for_ccid was involved, this also triggered a bug warning since ccid3_hc_tx_send_packet returned a negative value (which in reality was a negative delay value). The cause for this bug lies in the comparison if (delay >= hctx->ccid3hctx_delta) return delay / 1000L; The type of `delay' is `long', that of ccid3hctx_delta is `u32'. When comparing negative long values against u32 values, the test returned `true' whenever delay was smaller than 0 (meaning the packet was overdue to send). The fix is by casting, subtracting, and then testing the difference with regard to 0. This has been tested and shown to work. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:50:42 -02:00
Gerrit Renker	8a508ac26e	[DCCP]: Use higher RTO default for CCID3 The TFRC nofeedback timer normally expires after the maximum of 4 RTTs and twice the current send interval (RFC 3448, 4.3). On LANs with a small RTT this can mean a high processing load and reduced performance, since then the nofeedback timer is triggered very frequently. This patch provides a configuration option to set the bound for the nofeedback timer, using as default 100 milliseconds. By setting the configuration option to 0, strict RFC 3448 behaviour can be enforced for the nofeedback timer. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:50:23 -02:00
Gerrit Renker	6b57c93dc3	[DCCP]: Use `unsigned' for packet lengths This patch implements a suggestion by Ian McDonald and 1) Avoids tests against negative packet lengths by using unsigned int for packet payload lengths in the CCID send_packet()/packet_sent() routines 2) As a consequence, it removes an now unnecessary test with regard to `len > 0' in ccid3_hc_tx_packet_sent: that condition is always true, since * negative packet lengths are avoided * ccid3_hc_tx_send_packet flags an error whenever the payload length is 0. As a consequence, ccid3_hc_tx_packet_sent is never called as all errors returned by ccid_hc_tx_send_packet are caught in dccp_write_xmit 3) Removes the third argument of ccid_hc_tx_send_packet (the `len' parameter), since it is currently always set to skb->len. The code is updated with regard to this parameter change. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:31:02 -08:00
Gerrit Renker	a79ef76f4d	[DCCP] ccid3: Larger initial windows This implements the larger-initial-windows feature for CCID 3, as described in section 5 of RFC 4342. When the first feedback packet arrives, the sender can send up to 2..4 packets per RTT, instead of just one. The patch further * reduces the number of timestamping calls by passing the timestamp value (which is computed in one of the calling functions anyway) as argument * renames one constant with a very long name into one which is shorter and resembles the one in RFC 3448 (t_mbi) * simplifies some of the min_t/max_t cases where both `x', `y' have the same type Commiter note: renamed TFRC_t_mbi to TFRC_T_MBI, to follow Linux coding style. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:31:01 -08:00
Arnaldo Carvalho de Melo	841bac1d60	[DCCP]: Make {set,get}sockopt(DCCP_SOCKOPT_PACKET_SIZE) return 0 To reflect the fact that this now is of no effect, not making apps stop working, just be warned in the system log. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:31:00 -08:00
Gerrit Renker	5aed324369	[DCCP]: Tidy up unused structures This removes and cleans up unused variables and structures which have become unnecessary following the introduction of the EWMA patch to automatically track the CCID 3 receiver/sender packet sizes `s'. It deprecates the PACKET_SIZE socket option by returning an error code and printing a deprecation warning if an application tries to read or write this socket option. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:59 -08:00
Gerrit Renker	78ad713da6	[DCCP] ccid3: Track RX/TX packet size `s' using moving-average Problem:	2006-12-02 21:30:58 -08:00
Gerrit Renker	2a1fda6f6c	[DCCP] ccid3: Set NoFeedback Timeout according to RFC 3448 This corrects the setting of the nofeedback timer with regard to RFC 3448 - previously it was not set to max(4R, 2s/X) as specified. Using the maximum of 1 second as upper bound (as it was done before) can have detrimental effects, especially if R is small. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:57 -08:00
Gerrit Renker	4384260443	[DCCP]: Remove allocation of sysctl numbers This is in response to a request sent earlier by Eric W. Biederman and replaces all sysctl numbers for net.dccp.default with CTL_UNNUMBERED. It has been tested to compile and to work. Commiter note: I've removed the use of CTL_UNNUMBERED, not setting .ctl_name sets it to 0, that is the what CTL_UNNUMBERED is, reason is to avoid unneeded source code cluttering. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:56 -08:00
Gerrit Renker	5d0dbc4a9b	[DCCP] ccid3: Consolidate handling of t_RTO This patch * removes setting t_RTO in ccid3_hc_tx_init (per [RFC 3448, 4.2], t_RTO is undefined until feedback has been received); * makes some trivial changes (updates of comments); * performs a small optimisation by exploiting that the feedback timeout uses the value of t_ipi. The way it is done is safe, because the timeouts appear after the changes to t_ipi, ensuring that up-to-date values are used; * in ccid3_hc_tx_packet_recv, moves the t_rto statement closer to the calculation of the next_tmout. This makes the code clearer to read and is also safe, since t_rto is not updated until the next call of ccid3_hc_tx_packet_recv, and is not read by the functions called via ccid_wait_for_ccid(); * removes a `max' statement in sk_reset_timer, this is not needed since the timeout value is always greater than 1E6 microseconds. * adds `XXX'es to highlight that currently the nofeedback timer is set in a non-standard way Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:52 -08:00
Gerrit Renker	17893bc1a6	[DCCP] ccid3: Consistently update t_nom, t_ipi, t_delta This patch: * consolidates updating of parameters (t_nom, t_ipi, t_delta) which need to be updated at the same time, since they are inter-dependent * removes two inline functions which are no longer needed as a result of the above consolidation * resolves a FIXME regarding the re-calculation of t_ipi within the nofeedback timer, in the state where no feedback has previously been received * ties updating these parameters to updating the sending rate X, exploiting that all three parameters in turn depend on X; and using a small optimisation which can reduce the number of required instructions: only update the three parameters when X really changes Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:51 -08:00
Gerrit Renker	48e03eee71	[DCCP] ccid3: Consolidate timer resets This patch concerns updating the value of the nofeedback timer when no feedback has been received so far. Since in this case the value of R is still undefined according to [RFC 3448, 4.2], we can not perform step (3) of [RFC 3448, 4.3]. A clarification is provided in [RFC 4342, sec. 5], which states that in these cases the nofeedback timer (still) expires "after two seconds". Many thanks to Ian McDonald for pointing this out and providing the clarification. The patch * implements [RFC 4342, sec. 5] with regard to the above case * consolidates handling timer restart by - adding an appropriate jump label and - initialising the timeout value Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:50 -08:00
Gerrit Renker	5e19e3fcd7	[DCCP] ccid3: Resolve small FIXME This considers the case - ACK received while no packet has been sent so far. Resolved by printing a (rate-limited) warning message. Further removes an unnecessary BUG_ON in ccid3_hc_tx_packet_recv, received feedback on a terminating connection is simply ignored. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:41 -08:00
Gerrit Renker	70dbd5b0ef	[DCCP] ccid3: Remove redundant statements in ccid3_hc_tx_packet_sent This patch removes a switch statement which is redundant since, * nothing is done in states TFRC_SSTATE_NO_SENT/TFRC_SSTATE_NO_FBACK * it is impossible that the function is called in the state TFRC_SSTATE_TERM, since --the function is called, in dccp_write_xmit, after ccid3_hc_tx_send_packet --if ccid3_hc_tx_send_packet is called in state TFRC_SSTATE_TERM, it returns -EINVAL, which means that ccid3_hc_tx_packet_sent will not be called (compare dccp_write_xmit) --> therefore, this case is logically impossible * the remaining state is TFRC_SSTATE_FBACK which conditionally updates t_ipi, t_nom, and t_delta. This is a no-op, since --t_ipi only changes when feedback is received --however, when feedback arrives via ccid3_hc_tx_packet_recv, there is an identical code block which performs the same set of operations --performing the same set of operations again in ccid3_hc_tx_packet_sent therefore does not change anything, since between the time of receiving the last feedback (and therefore update of t_ipi, t_nom, and t_delta), the value of t_ipi has not changed --since t_ipi has not changed, the values of t_delta and t_nom also do not change, they depend fully on t_ipi Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:40 -08:00
Gerrit Renker	da335baf9e	[DCCP] ccid3: Avoid congestion control on zero-sized data packets This resolves an `XXX' in ccid3_hc_tx_send_packet(). The function is only called on Data and DataAck packets and returns a negative result on zero-sized messages. This is a reasonable policy since CCID 3 is a congestion-control module and congestion control on zero-sized Data(Ack) packets is in a way pathological. The patch uses a more suitable error code for this case, it returns the Posix.1 code `EBADMSG' ("Not a data message") instead of `ENOTCONN'. As a result of ignoring zero-sized packets, a the condition for a warning "First packet is data" in ccid3_hc_tx_packet_sent is always satisfied; this message has been removed since it will always be printed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:39 -08:00
Gerrit Renker	7da7f456d7	[DCCP] ccid3: Simplify control flow of ccid3_hc_tx_send_packet This makes some logically equivalent simplifications, by replacing rc - values plus goto's with direct return statements. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:38 -08:00
Gerrit Renker	91cf5a1725	[DCCP] ccid3: Fix calculation of t_ipi time of scheduled transmission Problem:	2006-12-02 21:30:37 -08:00
Gerrit Renker	f5c2d6367b	[DCCP] ccid3: Simplify control flow in the calculation of t_ipi This patch performs a simplifying (performance) optimisation: In each call of the inline function ccid3_calc_new_t_ipi(), the state is tested against TFRC_SSTATE_NO_FBACK. This is expensive when the function is called very often. A simpler solution, implemented by this patch, is to adapt the control flow. Background:	2006-12-02 21:30:36 -08:00
Gerrit Renker	90feeb951f	[DCCP] ccid3: Fix bug in calculation of first t_nom and first t_ipi Problem:	2006-12-02 21:30:35 -08:00
Andrea Bittau	6472c051fc	[DCCP] ccid2: Allow window to grow larger Now that we can stuff bigger ack vectors into options. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:34 -08:00
Andrea Bittau	522f1d095b	[DCCP] ackvec: Split long ack vectors across multiple options Ack vectors grow proportional to the window size. If an ack vector does not fit into a single option, it must be spread across multiple options. This patch will allow for windows to grow larger. Committer note: Simplified the patch a bit, original algorithm kept. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:33 -08:00
Andrea Bittau	bdf13d208d	[DCCP] ackvec: infrastructure for sending more than one ackvec per packet Commiter note: This was split from Andrea's original patch, in the process I changed the type of the ackvec index fields to u16 instead of to int and haven't folded dccp_ackvec_parse with dccp_ackvec_check_rcv_ackno. Next patch will actually do the insertion of more than one ackvec per packet, using, initially, up to a max of 2 ackvecs as per Andrea's original patch, then I'll work on support for larger ackvecs, be it using a sysctl or using setsockopt. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:32 -08:00
Andrea Bittau	0bd4ff1b15	[DCCP] ackvec: Remove unused dccpav_ack_ptr field from dccp_ackvec Commiter note: original patch was splitted. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:31 -08:00
Ian McDonald	82e3ab9dbe	[DCCP]: Adds the tx buffer sysctls This one got lost on the way from Ian to Gerrit to me, fix it. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:42 -08:00
Ian McDonald	455431739c	[DCCP] CCID3: Remove non-referenced variable This removes a non-referenced variable. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:41 -08:00
Ian McDonald	e1b7441e80	[DCCP]: Make dccp_probe more portable This makes the code of the dccp_probe module more portable. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:39 -08:00
Gerrit Renker	23ea8945f6	[CCID 3]: Add annotations for socket structures This adds documentation to the CCID 3 rx/tx socket fields, plus some minor re-formatting. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:38 -08:00
Gerrit Renker	59348b19ef	[DCCP]: Simplified conditions due to use of enum:8 states This reaps the benefit of the earlier patch, which changed the type of CCID 3 states to use enums, in that many conditions are now simplified and the number of possible (unexpected) values is greatly reduced. In a few instances, this also allowed to simplify pre-conditions; where care has been taken to retain logical equivalence. [DCCP]: Introduce a consistent BUG/WARN message scheme This refines the existing set of DCCP messages so that * BUG(), BUG_ON(), WARN_ON() have meaningful DCCP-specific counterparts * DCCP_CRIT (for severe warnings) is not rate-limited * DCCP_WARN() is introduced as rate-limited wrapper Using these allows a faster and cleaner transition to their original counterparts once the code has matured into a full DCCP implementation. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:38 -08:00
Ian McDonald	b1308dc015	[DCCP]: Set TX Queue Length Bounds via Sysctl Previously the transmit queue was unbounded. This patch: * puts a limit on transmit queue length and sends back EAGAIN if the buffer is full * sets the TX queue length to a sensible default * implements tx buffer sysctls for DCCP Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:37 -08:00
Gerrit Renker	56724aa434	[DCCP]: Add CCID3 debug support to Kconfig This adds a CCID3 debug option to the configuration menu which is missing in Kconfig, but already used by the code. CCID 2 already provides such an entry. To enable debugging, set CONFIG_IP_DCCP_CCID3_DEBUG=y NOTE: The use of ccid3_{t,r}x_state_name is safe, since now only enum values can appear. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:36 -08:00
Gerrit Renker	84116716cc	[DCCP]: enable debug messages also for static builds This patch * makes debugging (when configured) work both for static / module build * provides generic debugging macros for use in other DCCP / CCID modules * adds missing information about debug parameters to Kconfig * performs some code tidy-up Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:35 -08:00
Arnaldo Carvalho de Melo	eed73417d5	[DCCP]: Use kmemdup Code diff stats: [acme@newtoy net-2.6.20]$ codiff /tmp/dccp.ko.before /tmp/dccp.ko.after /pub/scm/linux/kernel/git/acme/net-2.6.20/net/dccp/feat.c: __dccp_feat_init \| -16 dccp_feat_change_recv \| -55 dccp_feat_clone \| -56 3 functions changed, 127 bytes removed [acme@newtoy net-2.6.20]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:59 -08:00
Andrea Bittau	32aac18dfa	[DCCP] CCID2: Code optimizations These are code optimizations which are relevant when dealing with large windows. They are not coded the way I would like to, but they do the job for the short-term. This patch should be more neat. Commiter note: Changed the seqno comparisions to use {after,before}48 to handle wrapping. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:52 -08:00
Arnaldo Carvalho de Melo	58a5a7b955	[NET]: Conditionally use bh_lock_sock_nested in sk_receive_skb Spotted by Ian McDonald, tentatively fixed by Gerrit Renker: http://www.mail-archive.com/dccp%40vger.kernel.org/msg00599.html Rewritten not to unroll sk_receive_skb, in the common case, i.e. no lock debugging, its optimized away. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:51 -08:00
Arnaldo Carvalho de Melo	e523a1550e	[DCCP]: One NET_INC_STATS() could be NET_INC_STATS_BH in dccp_v4_err() Spotted by Eric Dumazet in tcp_v4_rcv(). Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:50 -08:00
Gerrit Renker	3c6952624a	[DCCP]: Introduce DCCP_{BUG{_ON},CRIT} macros, use enum:8 for the ccid3 states This patch tackles the following problem: * the ccid3_hc_{t,r}x_sock define ccid3hc{t,r}x_state as `u8', but in reality there can only be a few, pre-defined enum names * this necessitates addiditional checking for unexpected values which would otherwise be caught by the compiler Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:49 -08:00
Al Viro	7d533f9418	[NET]: More dccp endianness annotations. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:45 -08:00
Al Viro	868c86bcb5	[NET]: annotate csum_ipv6_magic() callers in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:31 -08:00
Al Viro	2bda285315	[NET]: Annotate csum_tcpudp_magic() callers in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:30 -08:00
YOSHIFUJI Hideaki	cfb6eeb4c8	[TCP]: MD5 Signature Option (RFC2385) support. Based on implementation by Rick Payne. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:39 -08:00
Andrea Bittau	d23ca15a21	[DCCP] ACKVEC: Optimization - Do not traverse records if none will be found Do not traverse the list of ack vector records [proportional to window size] when we know we will not find what we are looking for. This is especially useful because ack vectors are checked twice: 1) Upon parsing of options. 2) Upon notification of a new ack. All of the work will occur during check #1. Therefore, when check #2 is performed, no new work will be done. This is now "detected" and there is no performance hit when doing #2. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:31 -08:00
Gerrit Renker	09dbc3895e	[DCCP]: Miscellaneous code tidy-ups This patch does not change code; it performs some trivial clean/tidy-ups: * removal of a `debug_prefix' string in favour of the already existing dccp_role(sk) * add documentation of structures and constants * separated out the cases for invalid packets (step 1 of the packet validation) * removing duplicate statements * combining declaration & initialisation Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:30 -08:00
Gerrit Renker	c02fdc0e81	[DCCP]: Make feature negotiation more readable This patch replaces cryptic feature negotiation messages of type Oct 31 15:42:20 kernel: dccp_feat_change: feat change type=32 feat=1 Oct 31 15:42:21 kernel: dccp_feat_change: feat change type=34 feat=1 Oct 31 15:42:21 kernel: dccp_feat_change: feat change type=32 feat=5 into ones of type: Nov 2 13:54:45 kernel: dccp_feat_change: ChangeL(CCID (1), 3) Nov 2 13:54:45 kernel: dccp_feat_change: ChangeR(CCID (1), 3) Nov 2 13:54:45 kernel: dccp_feat_change: ChangeL(Ack Ratio (5), 2) Also, * completed the feature number list wrt RFC 4340 sec. 6.4 * annotating which ones have been implemented so far * implemented rudimentary sanity checking in feat.c (FIXMEs) * some minor fixes Commiter note: uninlined dccp_feat_name and dccp_feat_typename, for consistency with dccp_{state,packet}_name, that, BTW, should be compiled only if CONFIG_IP_DCCP_DEBUG is selected, leaving this to another cset tho. Also shortened dccp_feat_negotiation_debug to dccp_feat_debug. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:29 -08:00
Gerrit Renker	6a128e053e	[DCCPv6]: Resolve conditional build problem Resolves the problem that if IPv6 was configured `y' and DCCP `m' then dccp_ipv6 was not built as a module. With this change, dccp_ipv6 is built as a module whenever DCCP OR IPv6 are configured as modules; it will be built-in only if both DCCP = `y' and IPV6 = `y'. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:28 -08:00
Gerrit Renker	b9df3cb8cf	[TCP/DCCP]: Introduce net_xmit_eval Throughout the TCP/DCCP (and tunnelling) code, it often happens that the return code of a transmit function needs to be tested against NET_XMIT_CN which is a value that does not indicate a strict error condition. This patch uses a macro for these recurring situations which is consistent with the already existing macro net_xmit_errno, saving on duplicated code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:27 -08:00
Gerrit Renker	d7f7365f57	[DCCPv6]: Choose a genuine initial sequence number This * resolves a FIXME - DCCPv6 connections started all with an initial sequence number of 1; * provides a redirection `secure_dccpv6_sequence_number' in case the init_sequence_v6 code should be updated later; * concentrates the update of S.GAR into dccp_connect_init(); * removes a duplicate dccp_update_gss() in ipv4.c; * uses inet->dport instead of usin->sin_port, due to the following assignment in dccp_v4_connect(): inet->dport = usin->sin_port; Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:22 -08:00
Gerrit Renker	865e9022d8	[DCCP]: Remove redundant statements in init_sequence (ISS) This patch removes the following redundancies: 1) The test skb->protocol == htons(ETH_P_IPV6) in dccp_v6_init_sequence is always true since * dccp_v6_conn_request() is the only calling function * dccp_v6_conn_request() redirects all skb's with ETH_P_IP to dccp_v4_conn_request() 2) The first argument, `struct sock *sk', of dccp_v{4,6}_init_sequence() is never used. (This is similar for tcp_v{4,6}_init_sequence, an analogous patch has been submitted to netdev and merged.) By the way - are the `sport' / `dport' arguments in the right order? I have made them consistent among calls but they seem to be in the reverse order. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:21 -08:00
Gerrit Renker	4ed800d02c	[DCCP]: Remove forward declarations in timer.c This removes 3 forward declarations by reordering 2 functions. No code change at all. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:20 -08:00
Gerrit Renker	afb0a34dd3	[DCCP]: Introduce a consistent naming scheme for sysctls In order to make their function clearer and obtain a consistent naming scheme to identify sysctls, all existing DCCP sysctls have been prefixed with `sysctl_dccp', following the same convention as used by TCP. Feature-specific sysctls retain the `feat' in the middle, although the `default' has been dropped, since it is obvious from use. Also removed a duplicate `dccp_feat_default_sequence_window' in ipv4.c. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:19 -08:00
Gerrit Renker	2e2e9e92bd	[DCCP]: Add sysctls to control retransmission behaviour This adds 3 sysctls which govern the retransmission behaviour of DCCP control packets (3way handshake, feature negotiation). It removes 4 FIXMEs from the code. The close resemblance of sysctl variables to their TCP analogues is emphasised not only by their name, but also by giving them the same initial values. This is useful since there is not much practical experience with DCCP yet. Furthermore, with regard to the previous patch, it is now possible to limit the number of keepalive-Responses by setting net.dccp.default.request_retries (also a bit like in TCP). Lastly, added documentation of all existing DCCP sysctls. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:18 -08:00
Gerrit Renker	e11d9d3080	[DCCP]: Increment sequence numbers on retransmitted Response packets Problem:	2006-12-02 21:22:17 -08:00
Gerrit Renker	08a29e41bb	[DCCP]: Update comments on precisely which packets can be retransmitted This updates program documentation: spell out precise conditions about which packets are eligible for retransmission (which is actually quite hard to extract from RFC 4340). It is based on the following table derived from RFC 4340: +-----------+---------------------------------+---------------------+ \| Type \| Retransmit? \| Remark \| +-----------+---------------------------------+---------------------+ \| Request \| in client-REQUEST state \| sec. 8.1.1 \| \| Response \| NEVER \| SHOULD NOT, 8.1.3 \| \| Data \| NEVER \| unreliable protocol \| \| Ack \| possible in client-PARTOPEN \| sec. 8.1.5 \| \| DataAck \| NEVER \| unreliable protocol \| \| CloseReq \| only in server-CLOSEREQ state \| MUST, sec. 8.3 \| \| Close \| in node-CLOSING state \| MUST, sec. 8.3 \| +-----------+-------------------------------------------------------+ \| Reset \| only in response to other packets \| \| Sync \| only in response to sequence-invalid packets (7.5.4) \| \| SyncAck \| only in response to Sync packets \| +-----------+-------------------------------------------------------+ Hence the only packets eligible for retransmission are: * Requests in client-REQUEST state (sec. 8.1.1) * Acks in client-PARTOPEN state (sec. 8.1.5) * CloseReq in server-CLOSEREQ state (sec. 8.3) * Close in node-CLOSING state (sec. 8.3) I had meant to put in a check for these types too, but have left that for later. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:16 -08:00
Gerrit Renker	6f4e5fff1e	[DCCP]: Support for partial checksums (RFC 4340, sec. 9.2) This patch does the following: a) introduces variable-length checksums as specified in [RFC 4340, sec. 9.2] b) provides necessary socket options and documentation as to how to use them c) basic support and infrastructure for the Minimum Checksum Coverage feature [RFC 4340, sec. 9.2.1]: acceptability tests, user notification and user interface In addition, it (1) fixes two bugs in the DCCPv4 checksum computation: * pseudo-header used checksum_len instead of skb->len * incorrect checksum coverage calculation based on dccph_x (2) removes dccp_v4_verify_checksum() since it reduplicates code of the checksum computation; code calling this function is updated accordingly. (3) now uses skb_checksum(), which is safer than checksum_partial() if the sk_buff has is a non-linear buffer (has pages attached to it). (4) fixes an outstanding TODO item: * If P.CsCov is too large for the packet size, drop packet and return. The code has been tested with applications, the latest version of tcpdump now comes with support for partial DCCP checksums. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:09 -08:00
Gerrit Renker	d83ca5accb	[DCCP]: Update code comments for Step 2/3 Sorts out the comments for processing steps 2,3 in section 8.5 of RFC 4340. All comments have been updated against this document, and the reference to step 2 has been made consistent throughout the files. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:04 -08:00
Gerrit Renker	cf557926f6	[DCCP]: tidy up dccp_v{4,6}_conn_request This is a code simplification to remove reduplicated code by concentrating and abstracting shared code. Detailed Changes:	2006-12-02 21:22:03 -08:00
Ian McDonald	f45b3ec481	[DCCP]: Fix logfile overflow This patch fixes data being spewed into the logs continually. As the code stood if there was a large queue and long delays timeo would go down to zero and never get reset. This fixes it by resetting timeo. Put constant into header as well. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:02 -08:00
Ian McDonald	fec5b80e49	[DCCP]: Fix DCCP Probe Typo Fixes a typo in Kconfig, patch is by Ian McDonald and is re-sent from http://www.mail-archive.com/dccp@vger.kernel.org/msg00579.html Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:01 -08:00
Gerrit Renker	73c9e02c22	[DCCPv6]: remove forward declarations in ipv6.c This does the same for ipv6.c as the preceding one does for ipv4.c: Only the inet_connection_sock_af_ops forward declarations remain, since at least dccp_ipv6_mapped has a circular dependency to dccp_v6_request_recv_sock. No code change, merely re-ordering. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:00 -08:00
Gerrit Renker	3d2fe62b8d	[DCCPv4]: remove forward declarations in ipv4.c This relates to Arnaldo's announcement in http://www.mail-archive.com/dccp@vger.kernel.org/msg00604.html Originally this had been part of the Oops fix and is a revised variant of http://www.mail-archive.com/dccp@vger.kernel.org/msg00598.html No code change, merely reshuffling, with the particular objective of having all request_sock_ops close(r) together for more clarity. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:59 -08:00
Gerrit Renker	8a73cd09d9	[DCCP]: calling dccp_v{4,6}_reqsk_send_ack is a BUG This patch removes two functions, the send_ack functions of request_sock, which are not called/used by the DCCP code. It is correct that these functions are not called, below is a justification why calling these functions (on a passive socket in the LISTEN/RESPOND state) would mean a DCCP protocol violation. A) Background: using request_sock in TCP:	2006-12-02 21:21:58 -08:00
Arnaldo Carvalho de Melo	f6484f7c7a	[DCCP] timewait: Remove leftover extern declarations Gerrit Renker noticed dccp_tw_deschedule and submitted a patch with a FIXME, but as he suggests in the same patch the best thing is to just ditch this declaration, while doing that also noticed that tcp_tw_count is as well not defined anywhere, so ditch it too. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:57 -08:00
Gerrit Renker	d23c7107bf	[DCCP]: Simplify jump labels in dccp_v{4,6}_rcv This is a code simplification and was singled out from the DCCPv6 Oops patch on http://www.mail-archive.com/dccp@vger.kernel.org/msg00600.html It mainly makes the code consistent between ipv{4,6}.c for the functions dccp_v4_rcv dccp_v6_rcv and removes the do_time_wait label to simplify code somewhat. Commiter note: fixed up a compile problem, trivial. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:56 -08:00
Gerrit Renker	9b42078ed6	[DCCP]: Combine allocating & zeroing header space on skb This is a code simplification: it combines three often recurring operations into one inline function, * allocate `len' bytes header space in skb * fill these `len' bytes with zeroes * cast the start of this header space as dccp_hdr Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:55 -08:00
Gerrit Renker	89e7e57778	[DCCPv6]: Add a FIXME for missing IPV6_PKTOPTIONS This refers to the possible memory leak pointed out in http://www.mail-archive.com/dccp@vger.kernel.org/msg00574.html, fixed by David Miller in http://www.mail-archive.com/netdev@vger.kernel.org/msg24881.html and adds a FIXME to point out where code is missing. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:54 -08:00
Gerrit Renker	60361be1be	[DCCP]: set safe upper bound for option length This is a re-send from http://www.mail-archive.com/dccp@vger.kernel.org/msg00553.html It is the same patch as before, but I have built in Arnaldo's suggestions pointed out in that posting. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:53 -08:00
David S. Miller	931731123a	[TCP]: Don't set SKB owner in tcp_transmit_skb(). The data itself is already charged to the SKB, doing the skb_set_owner_w() just generates a lot of noise and extra atomics we don't really need. Lmbench improvements on lat_tcp are minimal: before: TCP latency using localhost: 23.2701 microseconds TCP latency using localhost: 23.1994 microseconds TCP latency using localhost: 23.2257 microseconds after: TCP latency using localhost: 22.8380 microseconds TCP latency using localhost: 22.9465 microseconds TCP latency using localhost: 22.8462 microseconds Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:52 -08:00
David S. Miller	494b4e7d81	[DCCP]: Fix typo _read_mostly --> __read_mostly. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:45 -08:00
Eric Dumazet	72a3effaf6	[NET]: Size listen hash tables using backlog hint We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for each LISTEN socket, regardless of various parameters (listen backlog for example) On x86_64, this means order-1 allocations (might fail), even for 'small' sockets, expecting few connections. On the contrary, a huge server wanting a backlog of 50000 is slowed down a bit because of this fixed limit. This patch makes the sizing of listen hash table a dynamic parameter, depending of : - net.core.somaxconn tunable (default is 128) - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128) - backlog value given by user application (2nd parameter of listen()) For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of kmalloc(). We still limit memory allocation with the two existing tunables (somaxconn & tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM usage. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:44 -08:00
Akinobu Mita	ac16ca6412	[NET]: Fix kfifo_alloc() error check. The return value of kfifo_alloc() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-25 15:16:49 -08:00
David Howells	c4028958b6	WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:57:56 +00:00
YOSHIFUJI Hideaki	f2776ff047	[IPV6]: Fix address/interface handling in UDP and DCCP, according to the scoping architecture. TCP and RAW do not have this issue. Closes Bug #7432. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-21 17:41:56 -08:00
Randy Dunlap	234af48401	[DCCP]: fix printk format warnings Fix printk format warnings: build2.out:net/dccp/ccids/ccid2.c:355: warning: long long unsigned int format, u64 arg (arg 3) build2.out:net/dccp/ccids/ccid2.c:360: warning: long long unsigned int format, u64 arg (arg 3) build2.out:net/dccp/ccids/ccid2.c:482: warning: long long unsigned int format, u64 arg (arg 5) build2.out:net/dccp/ccids/ccid2.c:639: warning: long long unsigned int format, u64 arg (arg 3) build2.out:net/dccp/ccids/ccid2.c:639: warning: long long unsigned int format, u64 arg (arg 4) build2.out:net/dccp/ccids/ccid2.c:674: warning: long long unsigned int format, u64 arg (arg 3) build2.out:net/dccp/ccids/ccid2.c:720: warning: long long unsigned int format, u64 arg (arg 3) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:37 -08:00
Gerrit Renker	0e64e94e47	[DCCP]: Update documentation references. Updates the references to spec documents throughout the code, taking into account that * the DCCP, CCID 2, and CCID 3 drafts all became RFCs in March this year * RFC 1063 was obsoleted by RFC 1191 * draft-ietf-tcpimpl-pmtud-0x.txt was published as an Informational RFC, RFC 2923 on 2000-09-22. All references verified. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-24 16:17:51 -07:00
David S. Miller	fd169f15a6	[DCCP] ipv6: Fix opt_skb leak. Based upon a patch from Jesper Juhl. Try to match the TCP IPv6 code this was copied from as much as possible, so that it's easy to see where to add the ipv6 pktoptions support code. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-21 19:55:21 -07:00
Gerrit Renker	82709531a8	[DCCP]: Fix Oops in DCCPv6 I think I got the cause for the Oops observed in http://www.mail-archive.com/dccp@vger.kernel.org/msg00578.html The problem is always with applications listening on PF_INET6 sockets. Apart from the mentioned oops, I observed another one one, triggered at irregular intervals via timer interrupt: run_timer_softirq -> dccp_keepalive_timer -> inet_csk_reqsk_queue_prune -> reqsk_free -> dccp_v6_reqsk_destructor The latter function is the problem and is also the last function to be called in said kernel panic. In any case, there is a real problem with allocating the right request_sock which is what this patch tackles. It fixes the following problem: - application listens on PF_INET6 - DCCPv4 packet comes in, is handed over to dccp_v4_do_rcv, from there to dccp_v4_conn_request Now: socket is PF_INET6, packet is IPv4. The following code then furnishes the connection with IPv6 - request_sock operations: req = reqsk_alloc(sk->sk_prot->rsk_prot); The first problem is that all further incoming packets will get a Reset since the connection can not be looked up. The second problem is worse: --> reqsk_alloc is called instead of inet6_reqsk_alloc --> consequently inet6_rsk_offset is never set (dangling pointer) --> the request_sock_ops are nevertheless still dccp6_request_ops --> destructor is called via reqsk_free --> dccp_v6_reqsk_destructor tries to free random memory location (inet6_rsk_offset not set) --> panic I have tested this for a while, DCCP sockets are now handled correctly in all three scenarios (v4/v6 only/v4-mapped). Commiter note: I've added the dccp_request_sock_ops forward declaration to keep the tree building and to reduce the size of the patch for 2.6.19, later I'll move the functions to the top of the affected source code to match what we have in the TCP counterpart, where this problem hasn't existed in the first place, dumb me not to have done the same thing on DCCP land 8) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-10-21 19:55:20 -07:00
YOSHIFUJI Hideaki	9469c7b4aa	[NET]: Use typesafe inet_twsk() inline function instead of cast. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-11 23:59:58 -07:00
Al Viro	bada8adc4e	[IPV4]: ip_route_connect() ipv4 address arguments annotated annotated address arguments (port number left alone for now); ditto for inferred net-endian variables in callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 17:54:06 -07:00
Ian McDonald	e41542f516	[DCCP]: Introduce dccp_probe This adds DCCP probing shamelessly ripped off from TCP probes by Stephen Hemminger. I've put in here support for further CCID3 variables as well. Andrea/Arnaldo might look to extend for CCID2. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-09-24 18:08:17 -03:00
Ian McDonald	3dd9a7c3a1	[DCCP]: Use constants for CCIDs With constants for CCID numbers this now uses them in some places. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-09-24 18:03:41 -03:00
Gerrit Renker	00e4d116a7	[DCCP]: Allow default/fallback service code. This has been discussed on dccp@vger and removes the necessity for applications to supply service codes in each and every case. If an application does not want to provide a service code, that's fine, it will be given 0. Otherwise, service codes can be set via socket options as before. This patch has been tested using various client/server configurations (including listening on multiple service codes). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-09-24 17:49:26 -03:00
Andrea Bittau	593f16aa62	[DCCP] CCID2: Add helper functions for changing important CCID2 state Introduce methods which manipulate interesting congestion control state such as pipe and rtt estimate. This is useful for people wishing to monitor the variables of CCID and instrument the code [perhaps using Kprobes]. Personally, I am a fan of encapsulation---that justifies this change =D. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:42 -07:00
Andrea Bittau	374bcf32c8	[DCCP] CCID2: Halve cwnd once upon multiple losses in a single RTT When multiple losses occur in one RTT, the window should be halved only once [a single "congestion event"]. This is now implemented, although not perfectly. Slightly changed the interface for changing the cwnd: pass hctx instead of dp. This is required in order to allow for change_cwnd to be called from _init(). Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:41 -07:00
Andrea Bittau	07978aabd5	[DCCP] CCID2: Allocate seq records on demand Allocate more sequence state on demand. Each time a packet is sent out by CCID2, a record of it needs to be kept. This list of records grows proportionally to cwnd. Previously, the length of this list was hardcored and therefore the cwnd could only grow to this value (of 128). Now, records are allocated on demand as necessary---cwnd may grow as it wishes. The exceptional case of when memory is not available is not handled gracefully. Perhaps, cwnd should be capped at that point. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:40 -07:00
Andrea Bittau	8d424f6ca2	[DCCP] CCID2: Add Kconfig option for CCID2 debug Allow the user to choose whether or not to enable CCID2 debugging via Kconfig. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:39 -07:00
Andrea Bittau	446dec30c7	[DCCP] CCID2: Tell DCCP to quickly check whether cwnd is available If not enough cwnd is available, tell the sender to check again as soon as possible. This will increase CPU utilization (polling frequently for cwnd) but will improve network performance. That is, the sender will need to wait less before detecting the increase of cwnd. A better architecture would be for the CCID to call-back (or dequeue) from DCCP when it is able to transmit traffic -- not the other way around as it currently occurs. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:39 -07:00
Andrea Bittau	d458c25ce2	[DCCP] CCID2: Initialize ssthresh to infinity Initialize the slow-start threshold to infinity. This way, upon connection initiation, slow-start will be exited only upon a packet loss. This patch will allow connections to quickly gain speed. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:11 -07:00
Andrea Bittau	29651cda97	[DCCP] CCID2: Fix jiffie wrap issues Jiffies are now handled correctly (I hope) in CCID2. If they wrap, no problem. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:10 -07:00
Andrea Bittau	4a0a50fb43	[DCCP] ackvec: Remove unused variables Get rid of unused variables in ackvector state. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:09 -07:00
Andrea Bittau	8e27e4650c	[DCCP] ackvec: Fix how DCCP_ACKVEC_STATE_NOT_RECEIVED is used Fix the way state is masked out. DCCP_ACKVEC_STATE_NOT_RECEIVED is defined as appears in the packet, therefore bit shifting is not required. This fix allows CCID2 to correctly detect losses. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:08 -07:00
Andrea Bittau	23d06e3b98	[DCCP] ACKVEC: fix ackvector length calculation Fix ackvector length calculation upon receiving an "ack-of-ack". This patch avoids the ackvector from growing too large which causes it to not be inserted into packets. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:07 -07:00
Dmitry Mishin	fda9ef5d67	[NET]: Fix sk->sk_filter field access Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg needlock = 0, while socket is not locked at that moment. In order to avoid this and similar issues in the future, use rcu for sk->sk_filter field read protection. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Kirill Korotaev <dev@openvz.org>	2006-09-22 15:18:47 -07:00
Ian McDonald	fc747e82b4	[DCCP]: Tidyup CCID3 list handling As Arnaldo Carvalho de Melo points out I should be using list_entry in case the structure changes in future. Current code functions but is reliant on position and requires type cast. Noticed when doing this that I have one more variable than I needed so removing that also. Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:33 -07:00
Ian McDonald	97e5848dd3	[DCCP]: Introduce tx buffering This adds transmit buffering to DCCP. I have tested with CCID2/3 and with loss and rate limiting. Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:17 -07:00
Ian McDonald	2a0109a707	[DCCP]: Shift sysctls into feat.h This shifts further sysctls into feat.h. No change in functionality - shifting code only. Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:16 -07:00
YOSHIFUJI Hideaki	8e1ef0a95b	[IPV6]: Cache source address as well in ipv6_pinfo{}. Based on MIPL2 kernel patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:45 -07:00
Herbert Xu	8f491069b4	[IPV4]: Use network-order dport for all visible inet_lookup_* Right now most inet_lookup_* functions take a host-order hnum instead of a network-order dport because that's how it is represented internally. This means that users of these functions have to be careful about using the right byte-order. To add more confusion, inet_lookup takes a network-order dport unlike all other functions. So this patch changes all visible inet_lookup functions to take a dport and move all dport->hnum conversion inside them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:14 -07:00
Venkat Yekkirala	4237c75c0a	[MLSXFRM]: Auto-labeling of child sockets This automatically labels the TCP, Unix stream, and dccp child sockets as well as openreqs to be at the same MLS level as the peer. This will result in the selection of appropriately labeled IPSec Security Associations. This also uses the sock's sid (as opposed to the isec sid) in SELinux enforcement of secmark in rcv_skb and postroute_last hooks. Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:29 -07:00
Venkat Yekkirala	beb8d13bed	[MLSXFRM]: Add flow labeling This labels the flows that could utilize IPSec xfrms at the points the flows are defined so that IPSec policy and SAs at the right label can be used. The following protos are currently not handled, but they should continue to be able to use single-labeled IPSec like they currently do. ipmr ip_gre ipip igmp sit sctp ip6_tunnel (IPv6 over IPv6 tunnel device) decnet Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:27 -07:00
Ian McDonald	66a377c504	[DCCP]: Fix CCID3 This fixes CCID3 to give much closer performance to RFC4342. CCID3 is meant to alter sending rate based on RTT and loss. The performance was verified against: http://wand.net.nz/~perry/max_download.php For example I tested with netem and had the following parameters: Delayed Acks 1, MSS 256 bytes, RTT 105 ms, packet loss 5%. This gives a theoretical speed of 71.9 Kbits/s. I measured across three runs with this patch set and got 70.1 Kbits/s. Without this patchset the average was 232 Kbits/s which means Linux can't be used for CCID3 research properly. I also tested with netem turned off so box just acting as router with 1.2 msec RTT. The performance with this is the same with or without the patch at around 30 Mbit/s. Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-26 23:40:50 -07:00
Ian McDonald	80193aee18	[DCCP]: Introduce dccp_rx_hist_find_entry This adds a new function dccp_rx_hist_find_entry. Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-26 19:07:36 -07:00
Ian McDonald	837d107cd1	[DCCP]: Introduces follows48 function This adds a new function to see if two sequence numbers follow each other. Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-26 19:06:42 -07:00
Ian McDonald	e6bccd3573	[DCCP]: Update contact details and copyright Just updating copyright and contacts Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-26 19:01:30 -07:00
Ian McDonald	f3166c0717	[DCCP]: Fix typo This fixes a small typo in net/dccp/libs/packet_history.c Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-26 19:01:03 -07:00
Herbert Xu	497c615aba	[IPV6]: Audit all ip6_dst_lookup/ip6_dst_store calls The current users of ip6_dst_lookup can be divided into two classes: 1) The caller holds no locks and is in user-context (UDP). 2) The caller does not want to lookup the dst cache at all. The second class covers everyone except UDP because most people do the cache lookup directly before calling ip6_dst_lookup. This patch adds ip6_sk_dst_lookup for the first class. Similarly ip6_dst_store users can be divded into those that need to take the socket dst lock and those that don't. This patch adds __ip6_dst_store for those (everyone except UDP/datagram) that don't need an extra lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-02 13:38:14 -07:00
Ian McDonald	4b79f0af48	[DCCP]: Fix default sequence window size When using the default sequence window size (100) I got the following in my logs: Jun 22 14:24:09 localhost kernel: [ 1492.114775] DCCP: Step 6 failed for DATA packet, (LSWL(6279674225) <= P.seqno(6279674749) <= S.SWH(6279674324)) and (P.ackno doesn't exist or LAWL(18798206530) <= P.ackno(1125899906842620) <= S.AWH(18798206548), sending SYNC... Jun 22 14:24:09 localhost kernel: [ 1492.115147] DCCP: Step 6 failed for DATA packet, (LSWL(6279674225) <= P.seqno(6279674750) <= S.SWH(6279674324)) and (P.ackno doesn't exist or LAWL(18798206530) <= P.ackno(1125899906842620) <= S.AWH(18798206549), sending SYNC... I went to alter the default sysctl and it didn't take for new sockets. Below patch fixes this. I think the default is too low but it is what the DCCP spec specifies. As a side effect of this my rx speed using iperf goes from about 2.8 Mbits/sec to 3.5. This is still far too slow but it is a step in the right direction. Compile tested only for IPv6 but not particularly complex change. Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-24 12:44:21 -07:00
Alan Cox	9faefb6d41	[DCCP]: Fix sparse warnings. No actual bugs that I can see just a couple of unmarked casts getting annoying in my debug log files. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-10 14:50:37 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Jean-Luc Leger	538c5902b8	[PATCH] clean up default value of IP_DCCP_ACKVEC Default values for boolean and tristate options can only be 'y', 'm' or 'n'. This patch removes wrong default for IP_DCCP_ACKVEC. Signed-off-by: Jean-Luc Leger <jean-luc.leger@dspnet.fr.eu.org> Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-23 07:43:04 -07:00
Chris Leech	624d116473	[I/OAT]: Make sk_eat_skb I/OAT aware. Add an extra argument to sk_eat_skb, and make it move early copied packets to the async_wait_queue instead of freeing them. Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-17 21:25:52 -07:00
Andrea Bittau	afec35e3fe	[DCCP] Ackvec: fix soft lockup in ackvec handling code A soft lockup existed in the handling of ack vector records. Specifically, when a tail of the list of ack vector records was removed, it was possible to end up iterating infinitely on an element of the tail. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-11 21:08:03 -07:00
Herbert Xu	134af34632	[DCCP]: Fix sock_orphan dead lock Calling sock_orphan inside bh_lock_sock in dccp_close can lead to dead locks. For example, the inet_diag code holds sk_callback_lock without disabling BH. If an inbound packet arrives during that admittedly tiny window, it will cause a dead lock on bh_lock_sock. Another possible path would be through sock_wfree if the network device driver frees the tx skb in process context with BH enabled. We can fix this by moving sock_orphan out of bh_lock_sock. The tricky bit is to work out when we need to destroy the socket ourselves and when it has already been destroyed by someone else. By moving sock_orphan before the release_sock we can solve this problem. This is because as long as we own the socket lock its state cannot change. So we simply record the socket state before the release_sock and then check the state again after we regain the socket lock. If the socket state has transitioned to DCCP_CLOSED in the time being, we know that the socket has been destroyed. Otherwise the socket is still ours to keep. This problem was discoverd by Ingo Molnar using his lock validator. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-05 17:09:13 -07:00
Eric Sesterhenn	b8282dcf04	[DCCP]: Fix leak in net/dccp/ipv4.c we dont free req if we cant parse the options. This fixes coverity bug id #1046 Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-11 17:21:06 -07:00
Randy Dunlap	68907dad58	[DCCP]: Use NULL for pointers, comfort sparse. From: Randy Dunlap <rdunlap@xenotime.net> Use NULL instead of 0 for pointers. Fix these sparse warnings: net/dccp/feat.c:207:20: warning: Using plain integer as NULL pointer net/dccp/feat.c:325:21: warning: Using plain integer as NULL pointer net/dccp/feat.c:526:20: warning: Using plain integer as NULL pointer Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-29 13:58:25 -08:00
Davide Libenzi	f348d70a32	[PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications Implement the half-closed devices notifiation, by adding a new POLLRDHUP (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the existing POLLHUP handling, that does not report correctly half-closed devices, was feared to be changed, this implementation leaves the current POLLHUP reporting unchanged and simply add a new bit that is set in the few places where it makes sense. The same thing was discussed and conceptually agreed quite some time ago: http://lkml.org/lkml/2003/7/12/116 Since this new event bit is added to the existing Linux poll infrastruture, even the existing poll/select system calls will be able to use it. As far as the existing POLLHUP handling, the patch leaves it as is. The pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing archs and sets the bit in the six relevant files. The other attached diff is the simple change required to sys/epoll.h to add the EPOLLRDHUP definition. There is "a stupid program" to test POLLRDHUP delivery here: http://www.xmailserver.org/pollrdhup-test.c It tests poll(2), but since the delivery is same epoll(2) will work equally. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-25 08:22:56 -08:00

1 2 3 4 5 ...

413 Commits