ViewVC Help
View File | Revision Log | Show Annotations | Revision Graph | Root Listing
root/cebix/BasiliskII/src/slirp/tcp_input.c
Revision: 1.4
Committed: 2012-03-30T01:10:28Z (12 years, 7 months ago) by asvitkine
Content type: text/plain
Branch: MAIN
CVS Tags: HEAD
Changes since 1.3: +1 -5 lines
Error occurred while calculating annotation data.
Log Message:
Switch slirp to 3-clause BSD license. This change went in upstream to QEMU's
version of slirp (where this code comes from), with the following checkin:

commit 2f5f89963186d42a7ded253bc6cf5b32abb45cec
Author: aliguori <aliguori@c046a42c-6fe2-441c-8c8c-71466251a162>
Date:   Mon Jan 26 19:37:41 2009 +0000

    Remove the advertising clause from the slirp license

    According to the FSF, the 4-clause BSD license, which slirp is covered under,
    is not compatible with the GPL or LGPL[1].

    [1] http://www.fsf.org/licensing/licenses/index_html#GPLIncompatibleLicenses

    There are three declared copyright holders in slirp that use the 4-clause
    BSD license, the Regents of UC Berkley, Danny Gasparovski, and Kelly Price.
    Below are the appropriate permissions to remove the advertise clause from slirp
    from each party.

    Special thanks go to Richard Fontana from Red Hat for contacting all of the
    necessary authors to resolve this issue!

    Regents of UC Berkley:
    From ftp://ftp.cs.berkeley.edu/pub/4bsd/README.Impt.License.Change

    July 22, 1999

    To All Licensees, Distributors of Any Version of BSD:

    As you know, certain of the Berkeley Software Distribution ("BSD") source
    code files require that further distributions of products containing all or
    portions of the software, acknowledge within their advertising materials
    that such products contain software developed by UC Berkeley and its
    contributors.

    Specifically, the provision reads:

    "     * 3. All advertising materials mentioning features or use of this software
          *    must display the following acknowledgement:
          *    This product includes software developed by the University of
          *    California, Berkeley and its contributors."

    Effective immediately, licensees and distributors are no longer required to
    include the acknowledgement within advertising materials.  Accordingly, the
    foregoing paragraph of those BSD Unix files containing it is hereby deleted
    in its entirety.

    William Hoskins
    Director, Office of Technology Licensing
    University of California, Berkeley

    Danny Gasparovski:

    Subject: RE: Slirp license
    Date: Thu, 8 Jan 2009 10:51:00 +1100
    From: "Gasparovski, Daniel" <Daniel.Gasparovski@ato.gov.au>
    To: "Richard Fontana" <rfontana@redhat.com>

    Hi Richard,

    I have no objection to having Slirp code in QEMU be licensed under the
    3-clause BSD license.

    Thanks for taking the effort to consult me about this.


    Dan ...

    Kelly Price:

    Date: Thu, 8 Jan 2009 19:38:56 -0500
    From: "Kelly Price" <strredwolf@gmail.com>
    To: "Richard Fontana" <rfontana@redhat.com>
    Subject: Re: Slirp license

    Thanks for contacting me, Richard.  I'm glad you were able to find
    Dan, as I've been "keeping the light on" for Slirp.  I have no use for
    it now, and I have little time for it (now holding onto Keenspot's
    Comic Genesis and having a regular US state government position). If
    Dan would like to return to the project, I'd love to give it back to
    him.

    As for copyright, I don't own all of it.  Dan does, so I will defer to
    him.  Any of my patches I will gladly license to the 3-part BSD
    license.  My interest in re-licensing was because we didn't have ready
    info to contact Dan.  If Dan would like to port Slirp back out of
    QEMU, a lot of us 64-bit users would be grateful.

    Feel free to share this email address with Dan.  I will be glad to
    effect a transfer of the project to him and Mr. Bellard of the QEMU
    project.

    Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


    git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6451 c046a42c-6fe2-441c-8c8c-71466251a162

File Contents

# Content
1 /*
2 * Copyright (c) 1982, 1986, 1988, 1990, 1993, 1994
3 * The Regents of the University of California. All rights reserved.
4 *
5 * Redistribution and use in source and binary forms, with or without
6 * modification, are permitted provided that the following conditions
7 * are met:
8 * 1. Redistributions of source code must retain the above copyright
9 * notice, this list of conditions and the following disclaimer.
10 * 2. Redistributions in binary form must reproduce the above copyright
11 * notice, this list of conditions and the following disclaimer in the
12 * documentation and/or other materials provided with the distribution.
13 * 3. Neither the name of the University nor the names of its contributors
14 * may be used to endorse or promote products derived from this software
15 * without specific prior written permission.
16 *
17 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
18 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20 * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
21 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
22 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
23 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
24 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
25 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
26 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
27 * SUCH DAMAGE.
28 *
29 * @(#)tcp_input.c 8.5 (Berkeley) 4/10/94
30 * tcp_input.c,v 1.10 1994/10/13 18:36:32 wollman Exp
31 */
32
33 /*
34 * Changes and additions relating to SLiRP
35 * Copyright (c) 1995 Danny Gasparovski.
36 *
37 * Please read the file COPYRIGHT for the
38 * terms and conditions of the copyright.
39 */
40
41 #include <stdlib.h>
42 #include <slirp.h>
43 #include "ip_icmp.h"
44
45 struct socket tcb;
46
47 int tcprexmtthresh = 3;
48 struct socket *tcp_last_so = &tcb;
49
50 tcp_seq tcp_iss; /* tcp initial send seq # */
51
52 #define TCP_PAWS_IDLE (24 * 24 * 60 * 60 * PR_SLOWHZ)
53
54 /* for modulo comparisons of timestamps */
55 #define TSTMP_LT(a,b) ((int)((a)-(b)) < 0)
56 #define TSTMP_GEQ(a,b) ((int)((a)-(b)) >= 0)
57
58 /*
59 * Insert segment ti into reassembly queue of tcp with
60 * control block tp. Return TH_FIN if reassembly now includes
61 * a segment with FIN. The macro form does the common case inline
62 * (segment is the next to be received on an established connection,
63 * and the queue is empty), avoiding linkage into and removal
64 * from the queue and repetition of various conversions.
65 * Set DELACK for segments received in order, but ack immediately
66 * when segments are out of order (so fast retransmit can work).
67 */
68 #ifdef TCP_ACK_HACK
69 #define TCP_REASS(tp, ti, m, so, flags) {\
70 if ((ti)->ti_seq == (tp)->rcv_nxt && \
71 (tp)->seg_next == (tcpiphdrp_32)(tp) && \
72 (tp)->t_state == TCPS_ESTABLISHED) {\
73 if (ti->ti_flags & TH_PUSH) \
74 tp->t_flags |= TF_ACKNOW; \
75 else \
76 tp->t_flags |= TF_DELACK; \
77 (tp)->rcv_nxt += (ti)->ti_len; \
78 flags = (ti)->ti_flags & TH_FIN; \
79 tcpstat.tcps_rcvpack++;\
80 tcpstat.tcps_rcvbyte += (ti)->ti_len;\
81 if (so->so_emu) { \
82 if (tcp_emu((so),(m))) sbappend((so), (m)); \
83 } else \
84 sbappend((so), (m)); \
85 /* sorwakeup(so); */ \
86 } else {\
87 (flags) = tcp_reass((tp), (ti), (m)); \
88 tp->t_flags |= TF_ACKNOW; \
89 } \
90 }
91 #else
92 #define TCP_REASS(tp, ti, m, so, flags) { \
93 if ((ti)->ti_seq == (tp)->rcv_nxt && \
94 (tp)->seg_next == (tcpiphdrp_32)(tp) && \
95 (tp)->t_state == TCPS_ESTABLISHED) { \
96 tp->t_flags |= TF_DELACK; \
97 (tp)->rcv_nxt += (ti)->ti_len; \
98 flags = (ti)->ti_flags & TH_FIN; \
99 tcpstat.tcps_rcvpack++;\
100 tcpstat.tcps_rcvbyte += (ti)->ti_len;\
101 if (so->so_emu) { \
102 if (tcp_emu((so),(m))) sbappend(so, (m)); \
103 } else \
104 sbappend((so), (m)); \
105 /* sorwakeup(so); */ \
106 } else { \
107 (flags) = tcp_reass((tp), (ti), (m)); \
108 tp->t_flags |= TF_ACKNOW; \
109 } \
110 }
111 #endif
112
113 int
114 tcp_reass(tp, ti, m)
115 register struct tcpcb *tp;
116 register struct tcpiphdr *ti;
117 struct mbuf *m;
118 {
119 register struct tcpiphdr *q;
120 struct socket *so = tp->t_socket;
121 int flags;
122
123 /*
124 * Call with ti==0 after become established to
125 * force pre-ESTABLISHED data up to user socket.
126 */
127 if (ti == 0)
128 goto present;
129
130 /*
131 * Find a segment which begins after this one does.
132 */
133 for (q = (struct tcpiphdr *)tp->seg_next; q != (struct tcpiphdr *)tp;
134 q = (struct tcpiphdr *)q->ti_next)
135 if (SEQ_GT(q->ti_seq, ti->ti_seq))
136 break;
137
138 /*
139 * If there is a preceding segment, it may provide some of
140 * our data already. If so, drop the data from the incoming
141 * segment. If it provides all of our data, drop us.
142 */
143 if ((struct tcpiphdr *)q->ti_prev != (struct tcpiphdr *)tp) {
144 register int i;
145 q = (struct tcpiphdr *)q->ti_prev;
146 /* conversion to int (in i) handles seq wraparound */
147 i = q->ti_seq + q->ti_len - ti->ti_seq;
148 if (i > 0) {
149 if (i >= ti->ti_len) {
150 tcpstat.tcps_rcvduppack++;
151 tcpstat.tcps_rcvdupbyte += ti->ti_len;
152 m_freem(m);
153 /*
154 * Try to present any queued data
155 * at the left window edge to the user.
156 * This is needed after the 3-WHS
157 * completes.
158 */
159 goto present; /* ??? */
160 }
161 m_adj(m, i);
162 ti->ti_len -= i;
163 ti->ti_seq += i;
164 }
165 q = (struct tcpiphdr *)(q->ti_next);
166 }
167 tcpstat.tcps_rcvoopack++;
168 tcpstat.tcps_rcvoobyte += ti->ti_len;
169 REASS_MBUF(ti) = (mbufp_32) m; /* XXX */
170
171 /*
172 * While we overlap succeeding segments trim them or,
173 * if they are completely covered, dequeue them.
174 */
175 while (q != (struct tcpiphdr *)tp) {
176 register int i = (ti->ti_seq + ti->ti_len) - q->ti_seq;
177 if (i <= 0)
178 break;
179 if (i < q->ti_len) {
180 q->ti_seq += i;
181 q->ti_len -= i;
182 m_adj((struct mbuf *) REASS_MBUF(q), i);
183 break;
184 }
185 q = (struct tcpiphdr *)q->ti_next;
186 m = (struct mbuf *) REASS_MBUF((struct tcpiphdr *)q->ti_prev);
187 remque_32((void *)(q->ti_prev));
188 m_freem(m);
189 }
190
191 /*
192 * Stick new segment in its place.
193 */
194 insque_32(ti, (void *)(q->ti_prev));
195
196 present:
197 /*
198 * Present data to user, advancing rcv_nxt through
199 * completed sequence space.
200 */
201 if (!TCPS_HAVEESTABLISHED(tp->t_state))
202 return (0);
203 ti = (struct tcpiphdr *) tp->seg_next;
204 if (ti == (struct tcpiphdr *)tp || ti->ti_seq != tp->rcv_nxt)
205 return (0);
206 if (tp->t_state == TCPS_SYN_RECEIVED && ti->ti_len)
207 return (0);
208 do {
209 tp->rcv_nxt += ti->ti_len;
210 flags = ti->ti_flags & TH_FIN;
211 remque_32(ti);
212 m = (struct mbuf *) REASS_MBUF(ti); /* XXX */
213 ti = (struct tcpiphdr *)ti->ti_next;
214 /* if (so->so_state & SS_FCANTRCVMORE) */
215 if (so->so_state & SS_FCANTSENDMORE)
216 m_freem(m);
217 else {
218 if (so->so_emu) {
219 if (tcp_emu(so,m)) sbappend(so, m);
220 } else
221 sbappend(so, m);
222 }
223 } while (ti != (struct tcpiphdr *)tp && ti->ti_seq == tp->rcv_nxt);
224 /* sorwakeup(so); */
225 return (flags);
226 }
227
228 /*
229 * TCP input routine, follows pages 65-76 of the
230 * protocol specification dated September, 1981 very closely.
231 */
232 void
233 tcp_input(m, iphlen, inso)
234 register struct mbuf *m;
235 int iphlen;
236 struct socket *inso;
237 {
238 struct ip save_ip, *ip;
239 register struct tcpiphdr *ti;
240 caddr_t optp = NULL;
241 int optlen = 0;
242 int len, tlen, off;
243 register struct tcpcb *tp = 0;
244 register int tiflags;
245 struct socket *so = 0;
246 int todrop, acked, ourfinisacked, needoutput = 0;
247 /* int dropsocket = 0; */
248 int iss = 0;
249 u_long tiwin;
250 int ret;
251 /* int ts_present = 0; */
252
253 DEBUG_CALL("tcp_input");
254 DEBUG_ARGS((dfd," m = %8lx iphlen = %2d inso = %lx\n",
255 (long )m, iphlen, (long )inso ));
256
257 /*
258 * If called with m == 0, then we're continuing the connect
259 */
260 if (m == NULL) {
261 so = inso;
262
263 /* Re-set a few variables */
264 tp = sototcpcb(so);
265 m = so->so_m;
266 so->so_m = 0;
267 ti = so->so_ti;
268 tiwin = ti->ti_win;
269 tiflags = ti->ti_flags;
270
271 goto cont_conn;
272 }
273
274
275 tcpstat.tcps_rcvtotal++;
276 /*
277 * Get IP and TCP header together in first mbuf.
278 * Note: IP leaves IP header in first mbuf.
279 */
280 ti = mtod(m, struct tcpiphdr *);
281 if (iphlen > sizeof(struct ip )) {
282 ip_stripoptions(m, (struct mbuf *)0);
283 iphlen=sizeof(struct ip );
284 }
285 /* XXX Check if too short */
286
287
288 /*
289 * Save a copy of the IP header in case we want restore it
290 * for sending an ICMP error message in response.
291 */
292 ip=mtod(m, struct ip *);
293 save_ip = *ip;
294 save_ip.ip_len+= iphlen;
295
296 /*
297 * Checksum extended TCP header and data.
298 */
299 tlen = ((struct ip *)ti)->ip_len;
300 ti->ti_next = ti->ti_prev = 0;
301 ti->ti_x1 = 0;
302 ti->ti_len = htons((u_int16_t)tlen);
303 len = sizeof(struct ip ) + tlen;
304 /* keep checksum for ICMP reply
305 * ti->ti_sum = cksum(m, len);
306 * if (ti->ti_sum) { */
307 if(cksum(m, len)) {
308 tcpstat.tcps_rcvbadsum++;
309 goto drop;
310 }
311
312 /*
313 * Check that TCP offset makes sense,
314 * pull out TCP options and adjust length. XXX
315 */
316 off = ti->ti_off << 2;
317 if (off < sizeof (struct tcphdr) || off > tlen) {
318 tcpstat.tcps_rcvbadoff++;
319 goto drop;
320 }
321 tlen -= off;
322 ti->ti_len = tlen;
323 if (off > sizeof (struct tcphdr)) {
324 optlen = off - sizeof (struct tcphdr);
325 optp = mtod(m, caddr_t) + sizeof (struct tcpiphdr);
326
327 /*
328 * Do quick retrieval of timestamp options ("options
329 * prediction?"). If timestamp is the only option and it's
330 * formatted as recommended in RFC 1323 appendix A, we
331 * quickly get the values now and not bother calling
332 * tcp_dooptions(), etc.
333 */
334 /* if ((optlen == TCPOLEN_TSTAMP_APPA ||
335 * (optlen > TCPOLEN_TSTAMP_APPA &&
336 * optp[TCPOLEN_TSTAMP_APPA] == TCPOPT_EOL)) &&
337 * *(u_int32_t *)optp == htonl(TCPOPT_TSTAMP_HDR) &&
338 * (ti->ti_flags & TH_SYN) == 0) {
339 * ts_present = 1;
340 * ts_val = ntohl(*(u_int32_t *)(optp + 4));
341 * ts_ecr = ntohl(*(u_int32_t *)(optp + 8));
342 * optp = NULL; / * we've parsed the options * /
343 * }
344 */
345 }
346 tiflags = ti->ti_flags;
347
348 /*
349 * Convert TCP protocol specific fields to host format.
350 */
351 NTOHL(ti->ti_seq);
352 NTOHL(ti->ti_ack);
353 NTOHS(ti->ti_win);
354 NTOHS(ti->ti_urp);
355
356 /*
357 * Drop TCP, IP headers and TCP options.
358 */
359 m->m_data += sizeof(struct tcpiphdr)+off-sizeof(struct tcphdr);
360 m->m_len -= sizeof(struct tcpiphdr)+off-sizeof(struct tcphdr);
361
362 /*
363 * Locate pcb for segment.
364 */
365 findso:
366 so = tcp_last_so;
367 if (so->so_fport != ti->ti_dport ||
368 so->so_lport != ti->ti_sport ||
369 so->so_laddr.s_addr != ti->ti_src.s_addr ||
370 so->so_faddr.s_addr != ti->ti_dst.s_addr) {
371 so = solookup(&tcb, ti->ti_src, ti->ti_sport,
372 ti->ti_dst, ti->ti_dport);
373 if (so)
374 tcp_last_so = so;
375 ++tcpstat.tcps_socachemiss;
376 }
377
378 /*
379 * If the state is CLOSED (i.e., TCB does not exist) then
380 * all data in the incoming segment is discarded.
381 * If the TCB exists but is in CLOSED state, it is embryonic,
382 * but should either do a listen or a connect soon.
383 *
384 * state == CLOSED means we've done socreate() but haven't
385 * attached it to a protocol yet...
386 *
387 * XXX If a TCB does not exist, and the TH_SYN flag is
388 * the only flag set, then create a session, mark it
389 * as if it was LISTENING, and continue...
390 */
391 if (so == 0) {
392 if ((tiflags & (TH_SYN|TH_FIN|TH_RST|TH_URG|TH_ACK)) != TH_SYN)
393 goto dropwithreset;
394
395 if ((so = socreate()) == NULL)
396 goto dropwithreset;
397 if (tcp_attach(so) < 0) {
398 free(so); /* Not sofree (if it failed, it's not insqued) */
399 goto dropwithreset;
400 }
401
402 sbreserve(&so->so_snd, tcp_sndspace);
403 sbreserve(&so->so_rcv, tcp_rcvspace);
404
405 /* tcp_last_so = so; */ /* XXX ? */
406 /* tp = sototcpcb(so); */
407
408 so->so_laddr = ti->ti_src;
409 so->so_lport = ti->ti_sport;
410 so->so_faddr = ti->ti_dst;
411 so->so_fport = ti->ti_dport;
412
413 if ((so->so_iptos = tcp_tos(so)) == 0)
414 so->so_iptos = ((struct ip *)ti)->ip_tos;
415
416 tp = sototcpcb(so);
417 tp->t_state = TCPS_LISTEN;
418 }
419
420 /*
421 * If this is a still-connecting socket, this probably
422 * a retransmit of the SYN. Whether it's a retransmit SYN
423 * or something else, we nuke it.
424 */
425 if (so->so_state & SS_ISFCONNECTING)
426 goto drop;
427
428 tp = sototcpcb(so);
429
430 /* XXX Should never fail */
431 if (tp == 0)
432 goto dropwithreset;
433 if (tp->t_state == TCPS_CLOSED)
434 goto drop;
435
436 /* Unscale the window into a 32-bit value. */
437 /* if ((tiflags & TH_SYN) == 0)
438 * tiwin = ti->ti_win << tp->snd_scale;
439 * else
440 */
441 tiwin = ti->ti_win;
442
443 /*
444 * Segment received on connection.
445 * Reset idle time and keep-alive timer.
446 */
447 tp->t_idle = 0;
448 if (so_options)
449 tp->t_timer[TCPT_KEEP] = tcp_keepintvl;
450 else
451 tp->t_timer[TCPT_KEEP] = tcp_keepidle;
452
453 /*
454 * Process options if not in LISTEN state,
455 * else do it below (after getting remote address).
456 */
457 if (optp && tp->t_state != TCPS_LISTEN)
458 tcp_dooptions(tp, (u_char *)optp, optlen, ti);
459 /* , */
460 /* &ts_present, &ts_val, &ts_ecr); */
461
462 /*
463 * Header prediction: check for the two common cases
464 * of a uni-directional data xfer. If the packet has
465 * no control flags, is in-sequence, the window didn't
466 * change and we're not retransmitting, it's a
467 * candidate. If the length is zero and the ack moved
468 * forward, we're the sender side of the xfer. Just
469 * free the data acked & wake any higher level process
470 * that was blocked waiting for space. If the length
471 * is non-zero and the ack didn't move, we're the
472 * receiver side. If we're getting packets in-order
473 * (the reassembly queue is empty), add the data to
474 * the socket buffer and note that we need a delayed ack.
475 *
476 * XXX Some of these tests are not needed
477 * eg: the tiwin == tp->snd_wnd prevents many more
478 * predictions.. with no *real* advantage..
479 */
480 if (tp->t_state == TCPS_ESTABLISHED &&
481 (tiflags & (TH_SYN|TH_FIN|TH_RST|TH_URG|TH_ACK)) == TH_ACK &&
482 /* (!ts_present || TSTMP_GEQ(ts_val, tp->ts_recent)) && */
483 ti->ti_seq == tp->rcv_nxt &&
484 tiwin && tiwin == tp->snd_wnd &&
485 tp->snd_nxt == tp->snd_max) {
486 /*
487 * If last ACK falls within this segment's sequence numbers,
488 * record the timestamp.
489 */
490 /* if (ts_present && SEQ_LEQ(ti->ti_seq, tp->last_ack_sent) &&
491 * SEQ_LT(tp->last_ack_sent, ti->ti_seq + ti->ti_len)) {
492 * tp->ts_recent_age = tcp_now;
493 * tp->ts_recent = ts_val;
494 * }
495 */
496 if (ti->ti_len == 0) {
497 if (SEQ_GT(ti->ti_ack, tp->snd_una) &&
498 SEQ_LEQ(ti->ti_ack, tp->snd_max) &&
499 tp->snd_cwnd >= tp->snd_wnd) {
500 /*
501 * this is a pure ack for outstanding data.
502 */
503 ++tcpstat.tcps_predack;
504 /* if (ts_present)
505 * tcp_xmit_timer(tp, tcp_now-ts_ecr+1);
506 * else
507 */ if (tp->t_rtt &&
508 SEQ_GT(ti->ti_ack, tp->t_rtseq))
509 tcp_xmit_timer(tp, tp->t_rtt);
510 acked = ti->ti_ack - tp->snd_una;
511 tcpstat.tcps_rcvackpack++;
512 tcpstat.tcps_rcvackbyte += acked;
513 sbdrop(&so->so_snd, acked);
514 tp->snd_una = ti->ti_ack;
515 m_freem(m);
516
517 /*
518 * If all outstanding data are acked, stop
519 * retransmit timer, otherwise restart timer
520 * using current (possibly backed-off) value.
521 * If process is waiting for space,
522 * wakeup/selwakeup/signal. If data
523 * are ready to send, let tcp_output
524 * decide between more output or persist.
525 */
526 if (tp->snd_una == tp->snd_max)
527 tp->t_timer[TCPT_REXMT] = 0;
528 else if (tp->t_timer[TCPT_PERSIST] == 0)
529 tp->t_timer[TCPT_REXMT] = tp->t_rxtcur;
530
531 /*
532 * There's room in so_snd, sowwakup will read()
533 * from the socket if we can
534 */
535 /* if (so->so_snd.sb_flags & SB_NOTIFY)
536 * sowwakeup(so);
537 */
538 /*
539 * This is called because sowwakeup might have
540 * put data into so_snd. Since we don't so sowwakeup,
541 * we don't need this.. XXX???
542 */
543 if (so->so_snd.sb_cc)
544 (void) tcp_output(tp);
545
546 return;
547 }
548 } else if (ti->ti_ack == tp->snd_una &&
549 tp->seg_next == (tcpiphdrp_32)tp &&
550 ti->ti_len <= sbspace(&so->so_rcv)) {
551 /*
552 * this is a pure, in-sequence data packet
553 * with nothing on the reassembly queue and
554 * we have enough buffer space to take it.
555 */
556 ++tcpstat.tcps_preddat;
557 tp->rcv_nxt += ti->ti_len;
558 tcpstat.tcps_rcvpack++;
559 tcpstat.tcps_rcvbyte += ti->ti_len;
560 /*
561 * Add data to socket buffer.
562 */
563 if (so->so_emu) {
564 if (tcp_emu(so,m)) sbappend(so, m);
565 } else
566 sbappend(so, m);
567
568 /*
569 * XXX This is called when data arrives. Later, check
570 * if we can actually write() to the socket
571 * XXX Need to check? It's be NON_BLOCKING
572 */
573 /* sorwakeup(so); */
574
575 /*
576 * If this is a short packet, then ACK now - with Nagel
577 * congestion avoidance sender won't send more until
578 * he gets an ACK.
579 *
580 * It is better to not delay acks at all to maximize
581 * TCP throughput. See RFC 2581.
582 */
583 tp->t_flags |= TF_ACKNOW;
584 tcp_output(tp);
585 return;
586 }
587 } /* header prediction */
588 /*
589 * Calculate amount of space in receive window,
590 * and then do TCP input processing.
591 * Receive window is amount of space in rcv queue,
592 * but not less than advertised window.
593 */
594 { int win;
595 win = sbspace(&so->so_rcv);
596 if (win < 0)
597 win = 0;
598 tp->rcv_wnd = max(win, (int)(tp->rcv_adv - tp->rcv_nxt));
599 }
600
601 switch (tp->t_state) {
602
603 /*
604 * If the state is LISTEN then ignore segment if it contains an RST.
605 * If the segment contains an ACK then it is bad and send a RST.
606 * If it does not contain a SYN then it is not interesting; drop it.
607 * Don't bother responding if the destination was a broadcast.
608 * Otherwise initialize tp->rcv_nxt, and tp->irs, select an initial
609 * tp->iss, and send a segment:
610 * <SEQ=ISS><ACK=RCV_NXT><CTL=SYN,ACK>
611 * Also initialize tp->snd_nxt to tp->iss+1 and tp->snd_una to tp->iss.
612 * Fill in remote peer address fields if not previously specified.
613 * Enter SYN_RECEIVED state, and process any other fields of this
614 * segment in this state.
615 */
616 case TCPS_LISTEN: {
617
618 if (tiflags & TH_RST)
619 goto drop;
620 if (tiflags & TH_ACK)
621 goto dropwithreset;
622 if ((tiflags & TH_SYN) == 0)
623 goto drop;
624
625 /*
626 * This has way too many gotos...
627 * But a bit of spaghetti code never hurt anybody :)
628 */
629
630 /*
631 * If this is destined for the control address, then flag to
632 * tcp_ctl once connected, otherwise connect
633 */
634 if ((so->so_faddr.s_addr&htonl(0xffffff00)) == special_addr.s_addr) {
635 int lastbyte=ntohl(so->so_faddr.s_addr) & 0xff;
636 if (lastbyte!=CTL_ALIAS && lastbyte!=CTL_DNS) {
637 #if 0
638 if(lastbyte==CTL_CMD || lastbyte==CTL_EXEC) {
639 /* Command or exec adress */
640 so->so_state |= SS_CTL;
641 } else
642 #endif
643 {
644 /* May be an add exec */
645 struct ex_list *ex_ptr;
646 for(ex_ptr = exec_list; ex_ptr; ex_ptr = ex_ptr->ex_next) {
647 if(ex_ptr->ex_fport == so->so_fport &&
648 lastbyte == ex_ptr->ex_addr) {
649 so->so_state |= SS_CTL;
650 break;
651 }
652 }
653 }
654 if(so->so_state & SS_CTL) goto cont_input;
655 }
656 /* CTL_ALIAS: Do nothing, tcp_fconnect will be called on it */
657 }
658
659 if (so->so_emu & EMU_NOCONNECT) {
660 so->so_emu &= ~EMU_NOCONNECT;
661 goto cont_input;
662 }
663
664 if((tcp_fconnect(so) == -1) && (errno != EINPROGRESS) && (errno != EWOULDBLOCK)) {
665 u_char code=ICMP_UNREACH_NET;
666 DEBUG_MISC((dfd," tcp fconnect errno = %d-%s\n",
667 errno,strerror(errno)));
668 if(errno == ECONNREFUSED) {
669 /* ACK the SYN, send RST to refuse the connection */
670 tcp_respond(tp, ti, m, ti->ti_seq+1, (tcp_seq)0,
671 TH_RST|TH_ACK);
672 } else {
673 if(errno == EHOSTUNREACH) code=ICMP_UNREACH_HOST;
674 HTONL(ti->ti_seq); /* restore tcp header */
675 HTONL(ti->ti_ack);
676 HTONS(ti->ti_win);
677 HTONS(ti->ti_urp);
678 m->m_data -= sizeof(struct tcpiphdr)+off-sizeof(struct tcphdr);
679 m->m_len += sizeof(struct tcpiphdr)+off-sizeof(struct tcphdr);
680 *ip=save_ip;
681 icmp_error(m, ICMP_UNREACH,code, 0,strerror(errno));
682 }
683 tp = tcp_close(tp);
684 m_free(m);
685 } else {
686 /*
687 * Haven't connected yet, save the current mbuf
688 * and ti, and return
689 * XXX Some OS's don't tell us whether the connect()
690 * succeeded or not. So we must time it out.
691 */
692 so->so_m = m;
693 so->so_ti = ti;
694 tp->t_timer[TCPT_KEEP] = TCPTV_KEEP_INIT;
695 tp->t_state = TCPS_SYN_RECEIVED;
696 }
697 return;
698
699 cont_conn:
700 /* m==NULL
701 * Check if the connect succeeded
702 */
703 if (so->so_state & SS_NOFDREF) {
704 tp = tcp_close(tp);
705 goto dropwithreset;
706 }
707 cont_input:
708 tcp_template(tp);
709
710 if (optp)
711 tcp_dooptions(tp, (u_char *)optp, optlen, ti);
712 /* , */
713 /* &ts_present, &ts_val, &ts_ecr); */
714
715 if (iss)
716 tp->iss = iss;
717 else
718 tp->iss = tcp_iss;
719 tcp_iss += TCP_ISSINCR/2;
720 tp->irs = ti->ti_seq;
721 tcp_sendseqinit(tp);
722 tcp_rcvseqinit(tp);
723 tp->t_flags |= TF_ACKNOW;
724 tp->t_state = TCPS_SYN_RECEIVED;
725 tp->t_timer[TCPT_KEEP] = TCPTV_KEEP_INIT;
726 tcpstat.tcps_accepts++;
727 goto trimthenstep6;
728 } /* case TCPS_LISTEN */
729
730 /*
731 * If the state is SYN_SENT:
732 * if seg contains an ACK, but not for our SYN, drop the input.
733 * if seg contains a RST, then drop the connection.
734 * if seg does not contain SYN, then drop it.
735 * Otherwise this is an acceptable SYN segment
736 * initialize tp->rcv_nxt and tp->irs
737 * if seg contains ack then advance tp->snd_una
738 * if SYN has been acked change to ESTABLISHED else SYN_RCVD state
739 * arrange for segment to be acked (eventually)
740 * continue processing rest of data/controls, beginning with URG
741 */
742 case TCPS_SYN_SENT:
743 if ((tiflags & TH_ACK) &&
744 (SEQ_LEQ(ti->ti_ack, tp->iss) ||
745 SEQ_GT(ti->ti_ack, tp->snd_max)))
746 goto dropwithreset;
747
748 if (tiflags & TH_RST) {
749 if (tiflags & TH_ACK)
750 tp = tcp_drop(tp,0); /* XXX Check t_softerror! */
751 goto drop;
752 }
753
754 if ((tiflags & TH_SYN) == 0)
755 goto drop;
756 if (tiflags & TH_ACK) {
757 tp->snd_una = ti->ti_ack;
758 if (SEQ_LT(tp->snd_nxt, tp->snd_una))
759 tp->snd_nxt = tp->snd_una;
760 }
761
762 tp->t_timer[TCPT_REXMT] = 0;
763 tp->irs = ti->ti_seq;
764 tcp_rcvseqinit(tp);
765 tp->t_flags |= TF_ACKNOW;
766 if (tiflags & TH_ACK && SEQ_GT(tp->snd_una, tp->iss)) {
767 tcpstat.tcps_connects++;
768 soisfconnected(so);
769 tp->t_state = TCPS_ESTABLISHED;
770
771 /* Do window scaling on this connection? */
772 /* if ((tp->t_flags & (TF_RCVD_SCALE|TF_REQ_SCALE)) ==
773 * (TF_RCVD_SCALE|TF_REQ_SCALE)) {
774 * tp->snd_scale = tp->requested_s_scale;
775 * tp->rcv_scale = tp->request_r_scale;
776 * }
777 */
778 (void) tcp_reass(tp, (struct tcpiphdr *)0,
779 (struct mbuf *)0);
780 /*
781 * if we didn't have to retransmit the SYN,
782 * use its rtt as our initial srtt & rtt var.
783 */
784 if (tp->t_rtt)
785 tcp_xmit_timer(tp, tp->t_rtt);
786 } else
787 tp->t_state = TCPS_SYN_RECEIVED;
788
789 trimthenstep6:
790 /*
791 * Advance ti->ti_seq to correspond to first data byte.
792 * If data, trim to stay within window,
793 * dropping FIN if necessary.
794 */
795 ti->ti_seq++;
796 if (ti->ti_len > tp->rcv_wnd) {
797 todrop = ti->ti_len - tp->rcv_wnd;
798 m_adj(m, -todrop);
799 ti->ti_len = tp->rcv_wnd;
800 tiflags &= ~TH_FIN;
801 tcpstat.tcps_rcvpackafterwin++;
802 tcpstat.tcps_rcvbyteafterwin += todrop;
803 }
804 tp->snd_wl1 = ti->ti_seq - 1;
805 tp->rcv_up = ti->ti_seq;
806 goto step6;
807 } /* switch tp->t_state */
808 /*
809 * States other than LISTEN or SYN_SENT.
810 * First check timestamp, if present.
811 * Then check that at least some bytes of segment are within
812 * receive window. If segment begins before rcv_nxt,
813 * drop leading data (and SYN); if nothing left, just ack.
814 *
815 * RFC 1323 PAWS: If we have a timestamp reply on this segment
816 * and it's less than ts_recent, drop it.
817 */
818 /* if (ts_present && (tiflags & TH_RST) == 0 && tp->ts_recent &&
819 * TSTMP_LT(ts_val, tp->ts_recent)) {
820 *
821 */ /* Check to see if ts_recent is over 24 days old. */
822 /* if ((int)(tcp_now - tp->ts_recent_age) > TCP_PAWS_IDLE) {
823 */ /*
824 * * Invalidate ts_recent. If this segment updates
825 * * ts_recent, the age will be reset later and ts_recent
826 * * will get a valid value. If it does not, setting
827 * * ts_recent to zero will at least satisfy the
828 * * requirement that zero be placed in the timestamp
829 * * echo reply when ts_recent isn't valid. The
830 * * age isn't reset until we get a valid ts_recent
831 * * because we don't want out-of-order segments to be
832 * * dropped when ts_recent is old.
833 * */
834 /* tp->ts_recent = 0;
835 * } else {
836 * tcpstat.tcps_rcvduppack++;
837 * tcpstat.tcps_rcvdupbyte += ti->ti_len;
838 * tcpstat.tcps_pawsdrop++;
839 * goto dropafterack;
840 * }
841 * }
842 */
843
844 todrop = tp->rcv_nxt - ti->ti_seq;
845 if (todrop > 0) {
846 if (tiflags & TH_SYN) {
847 tiflags &= ~TH_SYN;
848 ti->ti_seq++;
849 if (ti->ti_urp > 1)
850 ti->ti_urp--;
851 else
852 tiflags &= ~TH_URG;
853 todrop--;
854 }
855 /*
856 * Following if statement from Stevens, vol. 2, p. 960.
857 */
858 if (todrop > ti->ti_len
859 || (todrop == ti->ti_len && (tiflags & TH_FIN) == 0)) {
860 /*
861 * Any valid FIN must be to the left of the window.
862 * At this point the FIN must be a duplicate or out
863 * of sequence; drop it.
864 */
865 tiflags &= ~TH_FIN;
866
867 /*
868 * Send an ACK to resynchronize and drop any data.
869 * But keep on processing for RST or ACK.
870 */
871 tp->t_flags |= TF_ACKNOW;
872 todrop = ti->ti_len;
873 tcpstat.tcps_rcvduppack++;
874 tcpstat.tcps_rcvdupbyte += todrop;
875 } else {
876 tcpstat.tcps_rcvpartduppack++;
877 tcpstat.tcps_rcvpartdupbyte += todrop;
878 }
879 m_adj(m, todrop);
880 ti->ti_seq += todrop;
881 ti->ti_len -= todrop;
882 if (ti->ti_urp > todrop)
883 ti->ti_urp -= todrop;
884 else {
885 tiflags &= ~TH_URG;
886 ti->ti_urp = 0;
887 }
888 }
889 /*
890 * If new data are received on a connection after the
891 * user processes are gone, then RST the other end.
892 */
893 if ((so->so_state & SS_NOFDREF) &&
894 tp->t_state > TCPS_CLOSE_WAIT && ti->ti_len) {
895 tp = tcp_close(tp);
896 tcpstat.tcps_rcvafterclose++;
897 goto dropwithreset;
898 }
899
900 /*
901 * If segment ends after window, drop trailing data
902 * (and PUSH and FIN); if nothing left, just ACK.
903 */
904 todrop = (ti->ti_seq+ti->ti_len) - (tp->rcv_nxt+tp->rcv_wnd);
905 if (todrop > 0) {
906 tcpstat.tcps_rcvpackafterwin++;
907 if (todrop >= ti->ti_len) {
908 tcpstat.tcps_rcvbyteafterwin += ti->ti_len;
909 /*
910 * If a new connection request is received
911 * while in TIME_WAIT, drop the old connection
912 * and start over if the sequence numbers
913 * are above the previous ones.
914 */
915 if (tiflags & TH_SYN &&
916 tp->t_state == TCPS_TIME_WAIT &&
917 SEQ_GT(ti->ti_seq, tp->rcv_nxt)) {
918 iss = tp->rcv_nxt + TCP_ISSINCR;
919 tp = tcp_close(tp);
920 goto findso;
921 }
922 /*
923 * If window is closed can only take segments at
924 * window edge, and have to drop data and PUSH from
925 * incoming segments. Continue processing, but
926 * remember to ack. Otherwise, drop segment
927 * and ack.
928 */
929 if (tp->rcv_wnd == 0 && ti->ti_seq == tp->rcv_nxt) {
930 tp->t_flags |= TF_ACKNOW;
931 tcpstat.tcps_rcvwinprobe++;
932 } else
933 goto dropafterack;
934 } else
935 tcpstat.tcps_rcvbyteafterwin += todrop;
936 m_adj(m, -todrop);
937 ti->ti_len -= todrop;
938 tiflags &= ~(TH_PUSH|TH_FIN);
939 }
940
941 /*
942 * If last ACK falls within this segment's sequence numbers,
943 * record its timestamp.
944 */
945 /* if (ts_present && SEQ_LEQ(ti->ti_seq, tp->last_ack_sent) &&
946 * SEQ_LT(tp->last_ack_sent, ti->ti_seq + ti->ti_len +
947 * ((tiflags & (TH_SYN|TH_FIN)) != 0))) {
948 * tp->ts_recent_age = tcp_now;
949 * tp->ts_recent = ts_val;
950 * }
951 */
952
953 /*
954 * If the RST bit is set examine the state:
955 * SYN_RECEIVED STATE:
956 * If passive open, return to LISTEN state.
957 * If active open, inform user that connection was refused.
958 * ESTABLISHED, FIN_WAIT_1, FIN_WAIT2, CLOSE_WAIT STATES:
959 * Inform user that connection was reset, and close tcb.
960 * CLOSING, LAST_ACK, TIME_WAIT STATES
961 * Close the tcb.
962 */
963 if (tiflags&TH_RST) switch (tp->t_state) {
964
965 case TCPS_SYN_RECEIVED:
966 /* so->so_error = ECONNREFUSED; */
967 goto close;
968
969 case TCPS_ESTABLISHED:
970 case TCPS_FIN_WAIT_1:
971 case TCPS_FIN_WAIT_2:
972 case TCPS_CLOSE_WAIT:
973 /* so->so_error = ECONNRESET; */
974 close:
975 tp->t_state = TCPS_CLOSED;
976 tcpstat.tcps_drops++;
977 tp = tcp_close(tp);
978 goto drop;
979
980 case TCPS_CLOSING:
981 case TCPS_LAST_ACK:
982 case TCPS_TIME_WAIT:
983 tp = tcp_close(tp);
984 goto drop;
985 }
986
987 /*
988 * If a SYN is in the window, then this is an
989 * error and we send an RST and drop the connection.
990 */
991 if (tiflags & TH_SYN) {
992 tp = tcp_drop(tp,0);
993 goto dropwithreset;
994 }
995
996 /*
997 * If the ACK bit is off we drop the segment and return.
998 */
999 if ((tiflags & TH_ACK) == 0) goto drop;
1000
1001 /*
1002 * Ack processing.
1003 */
1004 switch (tp->t_state) {
1005 /*
1006 * In SYN_RECEIVED state if the ack ACKs our SYN then enter
1007 * ESTABLISHED state and continue processing, otherwise
1008 * send an RST. una<=ack<=max
1009 */
1010 case TCPS_SYN_RECEIVED:
1011
1012 if (SEQ_GT(tp->snd_una, ti->ti_ack) ||
1013 SEQ_GT(ti->ti_ack, tp->snd_max))
1014 goto dropwithreset;
1015 tcpstat.tcps_connects++;
1016 tp->t_state = TCPS_ESTABLISHED;
1017 /*
1018 * The sent SYN is ack'ed with our sequence number +1
1019 * The first data byte already in the buffer will get
1020 * lost if no correction is made. This is only needed for
1021 * SS_CTL since the buffer is empty otherwise.
1022 * tp->snd_una++; or:
1023 */
1024 tp->snd_una=ti->ti_ack;
1025 if (so->so_state & SS_CTL) {
1026 /* So tcp_ctl reports the right state */
1027 ret = tcp_ctl(so);
1028 if (ret == 1) {
1029 soisfconnected(so);
1030 so->so_state &= ~SS_CTL; /* success XXX */
1031 } else if (ret == 2) {
1032 so->so_state = SS_NOFDREF; /* CTL_CMD */
1033 } else {
1034 needoutput = 1;
1035 tp->t_state = TCPS_FIN_WAIT_1;
1036 }
1037 } else {
1038 soisfconnected(so);
1039 }
1040
1041 /* Do window scaling? */
1042 /* if ((tp->t_flags & (TF_RCVD_SCALE|TF_REQ_SCALE)) ==
1043 * (TF_RCVD_SCALE|TF_REQ_SCALE)) {
1044 * tp->snd_scale = tp->requested_s_scale;
1045 * tp->rcv_scale = tp->request_r_scale;
1046 * }
1047 */
1048 (void) tcp_reass(tp, (struct tcpiphdr *)0, (struct mbuf *)0);
1049 tp->snd_wl1 = ti->ti_seq - 1;
1050 /* Avoid ack processing; snd_una==ti_ack => dup ack */
1051 goto synrx_to_est;
1052 /* fall into ... */
1053
1054 /*
1055 * In ESTABLISHED state: drop duplicate ACKs; ACK out of range
1056 * ACKs. If the ack is in the range
1057 * tp->snd_una < ti->ti_ack <= tp->snd_max
1058 * then advance tp->snd_una to ti->ti_ack and drop
1059 * data from the retransmission queue. If this ACK reflects
1060 * more up to date window information we update our window information.
1061 */
1062 case TCPS_ESTABLISHED:
1063 case TCPS_FIN_WAIT_1:
1064 case TCPS_FIN_WAIT_2:
1065 case TCPS_CLOSE_WAIT:
1066 case TCPS_CLOSING:
1067 case TCPS_LAST_ACK:
1068 case TCPS_TIME_WAIT:
1069
1070 if (SEQ_LEQ(ti->ti_ack, tp->snd_una)) {
1071 if (ti->ti_len == 0 && tiwin == tp->snd_wnd) {
1072 tcpstat.tcps_rcvdupack++;
1073 DEBUG_MISC((dfd," dup ack m = %lx so = %lx \n",
1074 (long )m, (long )so));
1075 /*
1076 * If we have outstanding data (other than
1077 * a window probe), this is a completely
1078 * duplicate ack (ie, window info didn't
1079 * change), the ack is the biggest we've
1080 * seen and we've seen exactly our rexmt
1081 * threshold of them, assume a packet
1082 * has been dropped and retransmit it.
1083 * Kludge snd_nxt & the congestion
1084 * window so we send only this one
1085 * packet.
1086 *
1087 * We know we're losing at the current
1088 * window size so do congestion avoidance
1089 * (set ssthresh to half the current window
1090 * and pull our congestion window back to
1091 * the new ssthresh).
1092 *
1093 * Dup acks mean that packets have left the
1094 * network (they're now cached at the receiver)
1095 * so bump cwnd by the amount in the receiver
1096 * to keep a constant cwnd packets in the
1097 * network.
1098 */
1099 if (tp->t_timer[TCPT_REXMT] == 0 ||
1100 ti->ti_ack != tp->snd_una)
1101 tp->t_dupacks = 0;
1102 else if (++tp->t_dupacks == tcprexmtthresh) {
1103 tcp_seq onxt = tp->snd_nxt;
1104 u_int win =
1105 min(tp->snd_wnd, tp->snd_cwnd) / 2 /
1106 tp->t_maxseg;
1107
1108 if (win < 2)
1109 win = 2;
1110 tp->snd_ssthresh = win * tp->t_maxseg;
1111 tp->t_timer[TCPT_REXMT] = 0;
1112 tp->t_rtt = 0;
1113 tp->snd_nxt = ti->ti_ack;
1114 tp->snd_cwnd = tp->t_maxseg;
1115 (void) tcp_output(tp);
1116 tp->snd_cwnd = tp->snd_ssthresh +
1117 tp->t_maxseg * tp->t_dupacks;
1118 if (SEQ_GT(onxt, tp->snd_nxt))
1119 tp->snd_nxt = onxt;
1120 goto drop;
1121 } else if (tp->t_dupacks > tcprexmtthresh) {
1122 tp->snd_cwnd += tp->t_maxseg;
1123 (void) tcp_output(tp);
1124 goto drop;
1125 }
1126 } else
1127 tp->t_dupacks = 0;
1128 break;
1129 }
1130 synrx_to_est:
1131 /*
1132 * If the congestion window was inflated to account
1133 * for the other side's cached packets, retract it.
1134 */
1135 if (tp->t_dupacks > tcprexmtthresh &&
1136 tp->snd_cwnd > tp->snd_ssthresh)
1137 tp->snd_cwnd = tp->snd_ssthresh;
1138 tp->t_dupacks = 0;
1139 if (SEQ_GT(ti->ti_ack, tp->snd_max)) {
1140 tcpstat.tcps_rcvacktoomuch++;
1141 goto dropafterack;
1142 }
1143 acked = ti->ti_ack - tp->snd_una;
1144 tcpstat.tcps_rcvackpack++;
1145 tcpstat.tcps_rcvackbyte += acked;
1146
1147 /*
1148 * If we have a timestamp reply, update smoothed
1149 * round trip time. If no timestamp is present but
1150 * transmit timer is running and timed sequence
1151 * number was acked, update smoothed round trip time.
1152 * Since we now have an rtt measurement, cancel the
1153 * timer backoff (cf., Phil Karn's retransmit alg.).
1154 * Recompute the initial retransmit timer.
1155 */
1156 /* if (ts_present)
1157 * tcp_xmit_timer(tp, tcp_now-ts_ecr+1);
1158 * else
1159 */
1160 if (tp->t_rtt && SEQ_GT(ti->ti_ack, tp->t_rtseq))
1161 tcp_xmit_timer(tp,tp->t_rtt);
1162
1163 /*
1164 * If all outstanding data is acked, stop retransmit
1165 * timer and remember to restart (more output or persist).
1166 * If there is more data to be acked, restart retransmit
1167 * timer, using current (possibly backed-off) value.
1168 */
1169 if (ti->ti_ack == tp->snd_max) {
1170 tp->t_timer[TCPT_REXMT] = 0;
1171 needoutput = 1;
1172 } else if (tp->t_timer[TCPT_PERSIST] == 0)
1173 tp->t_timer[TCPT_REXMT] = tp->t_rxtcur;
1174 /*
1175 * When new data is acked, open the congestion window.
1176 * If the window gives us less than ssthresh packets
1177 * in flight, open exponentially (maxseg per packet).
1178 * Otherwise open linearly: maxseg per window
1179 * (maxseg^2 / cwnd per packet).
1180 */
1181 {
1182 register u_int cw = tp->snd_cwnd;
1183 register u_int incr = tp->t_maxseg;
1184
1185 if (cw > tp->snd_ssthresh)
1186 incr = incr * incr / cw;
1187 tp->snd_cwnd = min(cw + incr, TCP_MAXWIN<<tp->snd_scale);
1188 }
1189 if (acked > so->so_snd.sb_cc) {
1190 tp->snd_wnd -= so->so_snd.sb_cc;
1191 sbdrop(&so->so_snd, (int )so->so_snd.sb_cc);
1192 ourfinisacked = 1;
1193 } else {
1194 sbdrop(&so->so_snd, acked);
1195 tp->snd_wnd -= acked;
1196 ourfinisacked = 0;
1197 }
1198 /*
1199 * XXX sowwakup is called when data is acked and there's room for
1200 * for more data... it should read() the socket
1201 */
1202 /* if (so->so_snd.sb_flags & SB_NOTIFY)
1203 * sowwakeup(so);
1204 */
1205 tp->snd_una = ti->ti_ack;
1206 if (SEQ_LT(tp->snd_nxt, tp->snd_una))
1207 tp->snd_nxt = tp->snd_una;
1208
1209 switch (tp->t_state) {
1210
1211 /*
1212 * In FIN_WAIT_1 STATE in addition to the processing
1213 * for the ESTABLISHED state if our FIN is now acknowledged
1214 * then enter FIN_WAIT_2.
1215 */
1216 case TCPS_FIN_WAIT_1:
1217 if (ourfinisacked) {
1218 /*
1219 * If we can't receive any more
1220 * data, then closing user can proceed.
1221 * Starting the timer is contrary to the
1222 * specification, but if we don't get a FIN
1223 * we'll hang forever.
1224 */
1225 if (so->so_state & SS_FCANTRCVMORE) {
1226 soisfdisconnected(so);
1227 tp->t_timer[TCPT_2MSL] = tcp_maxidle;
1228 }
1229 tp->t_state = TCPS_FIN_WAIT_2;
1230 }
1231 break;
1232
1233 /*
1234 * In CLOSING STATE in addition to the processing for
1235 * the ESTABLISHED state if the ACK acknowledges our FIN
1236 * then enter the TIME-WAIT state, otherwise ignore
1237 * the segment.
1238 */
1239 case TCPS_CLOSING:
1240 if (ourfinisacked) {
1241 tp->t_state = TCPS_TIME_WAIT;
1242 tcp_canceltimers(tp);
1243 tp->t_timer[TCPT_2MSL] = 2 * TCPTV_MSL;
1244 soisfdisconnected(so);
1245 }
1246 break;
1247
1248 /*
1249 * In LAST_ACK, we may still be waiting for data to drain
1250 * and/or to be acked, as well as for the ack of our FIN.
1251 * If our FIN is now acknowledged, delete the TCB,
1252 * enter the closed state and return.
1253 */
1254 case TCPS_LAST_ACK:
1255 if (ourfinisacked) {
1256 tp = tcp_close(tp);
1257 goto drop;
1258 }
1259 break;
1260
1261 /*
1262 * In TIME_WAIT state the only thing that should arrive
1263 * is a retransmission of the remote FIN. Acknowledge
1264 * it and restart the finack timer.
1265 */
1266 case TCPS_TIME_WAIT:
1267 tp->t_timer[TCPT_2MSL] = 2 * TCPTV_MSL;
1268 goto dropafterack;
1269 }
1270 } /* switch(tp->t_state) */
1271
1272 step6:
1273 /*
1274 * Update window information.
1275 * Don't look at window if no ACK: TAC's send garbage on first SYN.
1276 */
1277 if ((tiflags & TH_ACK) &&
1278 (SEQ_LT(tp->snd_wl1, ti->ti_seq) ||
1279 (tp->snd_wl1 == ti->ti_seq && (SEQ_LT(tp->snd_wl2, ti->ti_ack) ||
1280 (tp->snd_wl2 == ti->ti_ack && tiwin > tp->snd_wnd))))) {
1281 /* keep track of pure window updates */
1282 if (ti->ti_len == 0 &&
1283 tp->snd_wl2 == ti->ti_ack && tiwin > tp->snd_wnd)
1284 tcpstat.tcps_rcvwinupd++;
1285 tp->snd_wnd = tiwin;
1286 tp->snd_wl1 = ti->ti_seq;
1287 tp->snd_wl2 = ti->ti_ack;
1288 if (tp->snd_wnd > tp->max_sndwnd)
1289 tp->max_sndwnd = tp->snd_wnd;
1290 needoutput = 1;
1291 }
1292
1293 /*
1294 * Process segments with URG.
1295 */
1296 if ((tiflags & TH_URG) && ti->ti_urp &&
1297 TCPS_HAVERCVDFIN(tp->t_state) == 0) {
1298 /*
1299 * This is a kludge, but if we receive and accept
1300 * random urgent pointers, we'll crash in
1301 * soreceive. It's hard to imagine someone
1302 * actually wanting to send this much urgent data.
1303 */
1304 if (ti->ti_urp + so->so_rcv.sb_cc > so->so_rcv.sb_datalen) {
1305 ti->ti_urp = 0;
1306 tiflags &= ~TH_URG;
1307 goto dodata;
1308 }
1309 /*
1310 * If this segment advances the known urgent pointer,
1311 * then mark the data stream. This should not happen
1312 * in CLOSE_WAIT, CLOSING, LAST_ACK or TIME_WAIT STATES since
1313 * a FIN has been received from the remote side.
1314 * In these states we ignore the URG.
1315 *
1316 * According to RFC961 (Assigned Protocols),
1317 * the urgent pointer points to the last octet
1318 * of urgent data. We continue, however,
1319 * to consider it to indicate the first octet
1320 * of data past the urgent section as the original
1321 * spec states (in one of two places).
1322 */
1323 if (SEQ_GT(ti->ti_seq+ti->ti_urp, tp->rcv_up)) {
1324 tp->rcv_up = ti->ti_seq + ti->ti_urp;
1325 so->so_urgc = so->so_rcv.sb_cc +
1326 (tp->rcv_up - tp->rcv_nxt); /* -1; */
1327 tp->rcv_up = ti->ti_seq + ti->ti_urp;
1328
1329 }
1330 } else
1331 /*
1332 * If no out of band data is expected,
1333 * pull receive urgent pointer along
1334 * with the receive window.
1335 */
1336 if (SEQ_GT(tp->rcv_nxt, tp->rcv_up))
1337 tp->rcv_up = tp->rcv_nxt;
1338 dodata:
1339
1340 /*
1341 * Process the segment text, merging it into the TCP sequencing queue,
1342 * and arranging for acknowledgment of receipt if necessary.
1343 * This process logically involves adjusting tp->rcv_wnd as data
1344 * is presented to the user (this happens in tcp_usrreq.c,
1345 * case PRU_RCVD). If a FIN has already been received on this
1346 * connection then we just ignore the text.
1347 */
1348 if ((ti->ti_len || (tiflags&TH_FIN)) &&
1349 TCPS_HAVERCVDFIN(tp->t_state) == 0) {
1350 TCP_REASS(tp, ti, m, so, tiflags);
1351 /*
1352 * Note the amount of data that peer has sent into
1353 * our window, in order to estimate the sender's
1354 * buffer size.
1355 */
1356 len = so->so_rcv.sb_datalen - (tp->rcv_adv - tp->rcv_nxt);
1357 } else {
1358 m_free(m);
1359 tiflags &= ~TH_FIN;
1360 }
1361
1362 /*
1363 * If FIN is received ACK the FIN and let the user know
1364 * that the connection is closing.
1365 */
1366 if (tiflags & TH_FIN) {
1367 if (TCPS_HAVERCVDFIN(tp->t_state) == 0) {
1368 /*
1369 * If we receive a FIN we can't send more data,
1370 * set it SS_FDRAIN
1371 * Shutdown the socket if there is no rx data in the
1372 * buffer.
1373 * soread() is called on completion of shutdown() and
1374 * will got to TCPS_LAST_ACK, and use tcp_output()
1375 * to send the FIN.
1376 */
1377 /* sofcantrcvmore(so); */
1378 sofwdrain(so);
1379
1380 tp->t_flags |= TF_ACKNOW;
1381 tp->rcv_nxt++;
1382 }
1383 switch (tp->t_state) {
1384
1385 /*
1386 * In SYN_RECEIVED and ESTABLISHED STATES
1387 * enter the CLOSE_WAIT state.
1388 */
1389 case TCPS_SYN_RECEIVED:
1390 case TCPS_ESTABLISHED:
1391 if(so->so_emu == EMU_CTL) /* no shutdown on socket */
1392 tp->t_state = TCPS_LAST_ACK;
1393 else
1394 tp->t_state = TCPS_CLOSE_WAIT;
1395 break;
1396
1397 /*
1398 * If still in FIN_WAIT_1 STATE FIN has not been acked so
1399 * enter the CLOSING state.
1400 */
1401 case TCPS_FIN_WAIT_1:
1402 tp->t_state = TCPS_CLOSING;
1403 break;
1404
1405 /*
1406 * In FIN_WAIT_2 state enter the TIME_WAIT state,
1407 * starting the time-wait timer, turning off the other
1408 * standard timers.
1409 */
1410 case TCPS_FIN_WAIT_2:
1411 tp->t_state = TCPS_TIME_WAIT;
1412 tcp_canceltimers(tp);
1413 tp->t_timer[TCPT_2MSL] = 2 * TCPTV_MSL;
1414 soisfdisconnected(so);
1415 break;
1416
1417 /*
1418 * In TIME_WAIT state restart the 2 MSL time_wait timer.
1419 */
1420 case TCPS_TIME_WAIT:
1421 tp->t_timer[TCPT_2MSL] = 2 * TCPTV_MSL;
1422 break;
1423 }
1424 }
1425
1426 /*
1427 * If this is a small packet, then ACK now - with Nagel
1428 * congestion avoidance sender won't send more until
1429 * he gets an ACK.
1430 *
1431 * See above.
1432 */
1433 /* if (ti->ti_len && (unsigned)ti->ti_len < tp->t_maxseg) {
1434 */
1435 /* if ((ti->ti_len && (unsigned)ti->ti_len < tp->t_maxseg &&
1436 * (so->so_iptos & IPTOS_LOWDELAY) == 0) ||
1437 * ((so->so_iptos & IPTOS_LOWDELAY) &&
1438 * ((struct tcpiphdr_2 *)ti)->first_char == (char)27)) {
1439 */
1440 if (ti->ti_len && (unsigned)ti->ti_len <= 5 &&
1441 ((struct tcpiphdr_2 *)ti)->first_char == (char)27) {
1442 tp->t_flags |= TF_ACKNOW;
1443 }
1444
1445 /*
1446 * Return any desired output.
1447 */
1448 if (needoutput || (tp->t_flags & TF_ACKNOW)) {
1449 (void) tcp_output(tp);
1450 }
1451 return;
1452
1453 dropafterack:
1454 /*
1455 * Generate an ACK dropping incoming segment if it occupies
1456 * sequence space, where the ACK reflects our state.
1457 */
1458 if (tiflags & TH_RST)
1459 goto drop;
1460 m_freem(m);
1461 tp->t_flags |= TF_ACKNOW;
1462 (void) tcp_output(tp);
1463 return;
1464
1465 dropwithreset:
1466 /* reuses m if m!=NULL, m_free() unnecessary */
1467 if (tiflags & TH_ACK)
1468 tcp_respond(tp, ti, m, (tcp_seq)0, ti->ti_ack, TH_RST);
1469 else {
1470 if (tiflags & TH_SYN) ti->ti_len++;
1471 tcp_respond(tp, ti, m, ti->ti_seq+ti->ti_len, (tcp_seq)0,
1472 TH_RST|TH_ACK);
1473 }
1474
1475 return;
1476
1477 drop:
1478 /*
1479 * Drop space held by incoming segment and return.
1480 */
1481 m_free(m);
1482
1483 return;
1484 }
1485
1486 /* , ts_present, ts_val, ts_ecr) */
1487 /* int *ts_present;
1488 * u_int32_t *ts_val, *ts_ecr;
1489 */
1490 void
1491 tcp_dooptions(tp, cp, cnt, ti)
1492 struct tcpcb *tp;
1493 u_char *cp;
1494 int cnt;
1495 struct tcpiphdr *ti;
1496 {
1497 u_int16_t mss;
1498 int opt, optlen;
1499
1500 DEBUG_CALL("tcp_dooptions");
1501 DEBUG_ARGS((dfd," tp = %lx cnt=%i \n", (long )tp, cnt));
1502
1503 for (; cnt > 0; cnt -= optlen, cp += optlen) {
1504 opt = cp[0];
1505 if (opt == TCPOPT_EOL)
1506 break;
1507 if (opt == TCPOPT_NOP)
1508 optlen = 1;
1509 else {
1510 optlen = cp[1];
1511 if (optlen <= 0)
1512 break;
1513 }
1514 switch (opt) {
1515
1516 default:
1517 continue;
1518
1519 case TCPOPT_MAXSEG:
1520 if (optlen != TCPOLEN_MAXSEG)
1521 continue;
1522 if (!(ti->ti_flags & TH_SYN))
1523 continue;
1524 memcpy((char *) &mss, (char *) cp + 2, sizeof(mss));
1525 NTOHS(mss);
1526 (void) tcp_mss(tp, mss); /* sets t_maxseg */
1527 break;
1528
1529 /* case TCPOPT_WINDOW:
1530 * if (optlen != TCPOLEN_WINDOW)
1531 * continue;
1532 * if (!(ti->ti_flags & TH_SYN))
1533 * continue;
1534 * tp->t_flags |= TF_RCVD_SCALE;
1535 * tp->requested_s_scale = min(cp[2], TCP_MAX_WINSHIFT);
1536 * break;
1537 */
1538 /* case TCPOPT_TIMESTAMP:
1539 * if (optlen != TCPOLEN_TIMESTAMP)
1540 * continue;
1541 * *ts_present = 1;
1542 * memcpy((char *) ts_val, (char *)cp + 2, sizeof(*ts_val));
1543 * NTOHL(*ts_val);
1544 * memcpy((char *) ts_ecr, (char *)cp + 6, sizeof(*ts_ecr));
1545 * NTOHL(*ts_ecr);
1546 *
1547 */ /*
1548 * * A timestamp received in a SYN makes
1549 * * it ok to send timestamp requests and replies.
1550 * */
1551 /* if (ti->ti_flags & TH_SYN) {
1552 * tp->t_flags |= TF_RCVD_TSTMP;
1553 * tp->ts_recent = *ts_val;
1554 * tp->ts_recent_age = tcp_now;
1555 * }
1556 */ break;
1557 }
1558 }
1559 }
1560
1561
1562 /*
1563 * Pull out of band byte out of a segment so
1564 * it doesn't appear in the user's data queue.
1565 * It is still reflected in the segment length for
1566 * sequencing purposes.
1567 */
1568
1569 #ifdef notdef
1570
1571 void
1572 tcp_pulloutofband(so, ti, m)
1573 struct socket *so;
1574 struct tcpiphdr *ti;
1575 register struct mbuf *m;
1576 {
1577 int cnt = ti->ti_urp - 1;
1578
1579 while (cnt >= 0) {
1580 if (m->m_len > cnt) {
1581 char *cp = mtod(m, caddr_t) + cnt;
1582 struct tcpcb *tp = sototcpcb(so);
1583
1584 tp->t_iobc = *cp;
1585 tp->t_oobflags |= TCPOOB_HAVEDATA;
1586 memcpy(sp, cp+1, (unsigned)(m->m_len - cnt - 1));
1587 m->m_len--;
1588 return;
1589 }
1590 cnt -= m->m_len;
1591 m = m->m_next; /* XXX WRONG! Fix it! */
1592 if (m == 0)
1593 break;
1594 }
1595 panic("tcp_pulloutofband");
1596 }
1597
1598 #endif /* notdef */
1599
1600 /*
1601 * Collect new round-trip time estimate
1602 * and update averages and current timeout.
1603 */
1604
1605 void
1606 tcp_xmit_timer(tp, rtt)
1607 register struct tcpcb *tp;
1608 int rtt;
1609 {
1610 register short delta;
1611
1612 DEBUG_CALL("tcp_xmit_timer");
1613 DEBUG_ARG("tp = %lx", (long)tp);
1614 DEBUG_ARG("rtt = %d", rtt);
1615
1616 tcpstat.tcps_rttupdated++;
1617 if (tp->t_srtt != 0) {
1618 /*
1619 * srtt is stored as fixed point with 3 bits after the
1620 * binary point (i.e., scaled by 8). The following magic
1621 * is equivalent to the smoothing algorithm in rfc793 with
1622 * an alpha of .875 (srtt = rtt/8 + srtt*7/8 in fixed
1623 * point). Adjust rtt to origin 0.
1624 */
1625 delta = rtt - 1 - (tp->t_srtt >> TCP_RTT_SHIFT);
1626 if ((tp->t_srtt += delta) <= 0)
1627 tp->t_srtt = 1;
1628 /*
1629 * We accumulate a smoothed rtt variance (actually, a
1630 * smoothed mean difference), then set the retransmit
1631 * timer to smoothed rtt + 4 times the smoothed variance.
1632 * rttvar is stored as fixed point with 2 bits after the
1633 * binary point (scaled by 4). The following is
1634 * equivalent to rfc793 smoothing with an alpha of .75
1635 * (rttvar = rttvar*3/4 + |delta| / 4). This replaces
1636 * rfc793's wired-in beta.
1637 */
1638 if (delta < 0)
1639 delta = -delta;
1640 delta -= (tp->t_rttvar >> TCP_RTTVAR_SHIFT);
1641 if ((tp->t_rttvar += delta) <= 0)
1642 tp->t_rttvar = 1;
1643 } else {
1644 /*
1645 * No rtt measurement yet - use the unsmoothed rtt.
1646 * Set the variance to half the rtt (so our first
1647 * retransmit happens at 3*rtt).
1648 */
1649 tp->t_srtt = rtt << TCP_RTT_SHIFT;
1650 tp->t_rttvar = rtt << (TCP_RTTVAR_SHIFT - 1);
1651 }
1652 tp->t_rtt = 0;
1653 tp->t_rxtshift = 0;
1654
1655 /*
1656 * the retransmit should happen at rtt + 4 * rttvar.
1657 * Because of the way we do the smoothing, srtt and rttvar
1658 * will each average +1/2 tick of bias. When we compute
1659 * the retransmit timer, we want 1/2 tick of rounding and
1660 * 1 extra tick because of +-1/2 tick uncertainty in the
1661 * firing of the timer. The bias will give us exactly the
1662 * 1.5 tick we need. But, because the bias is
1663 * statistical, we have to test that we don't drop below
1664 * the minimum feasible timer (which is 2 ticks).
1665 */
1666 TCPT_RANGESET(tp->t_rxtcur, TCP_REXMTVAL(tp),
1667 (short)tp->t_rttmin, TCPTV_REXMTMAX); /* XXX */
1668
1669 /*
1670 * We received an ack for a packet that wasn't retransmitted;
1671 * it is probably safe to discard any error indications we've
1672 * received recently. This isn't quite right, but close enough
1673 * for now (a route might have failed after we sent a segment,
1674 * and the return path might not be symmetrical).
1675 */
1676 tp->t_softerror = 0;
1677 }
1678
1679 /*
1680 * Determine a reasonable value for maxseg size.
1681 * If the route is known, check route for mtu.
1682 * If none, use an mss that can be handled on the outgoing
1683 * interface without forcing IP to fragment; if bigger than
1684 * an mbuf cluster (MCLBYTES), round down to nearest multiple of MCLBYTES
1685 * to utilize large mbufs. If no route is found, route has no mtu,
1686 * or the destination isn't local, use a default, hopefully conservative
1687 * size (usually 512 or the default IP max size, but no more than the mtu
1688 * of the interface), as we can't discover anything about intervening
1689 * gateways or networks. We also initialize the congestion/slow start
1690 * window to be a single segment if the destination isn't local.
1691 * While looking at the routing entry, we also initialize other path-dependent
1692 * parameters from pre-set or cached values in the routing entry.
1693 */
1694
1695 int
1696 tcp_mss(tp, offer)
1697 register struct tcpcb *tp;
1698 u_int offer;
1699 {
1700 struct socket *so = tp->t_socket;
1701 int mss;
1702
1703 DEBUG_CALL("tcp_mss");
1704 DEBUG_ARG("tp = %lx", (long)tp);
1705 DEBUG_ARG("offer = %d", offer);
1706
1707 mss = min(if_mtu, if_mru) - sizeof(struct tcpiphdr);
1708 if (offer)
1709 mss = min(mss, offer);
1710 mss = max(mss, 32);
1711 if (mss < tp->t_maxseg || offer != 0)
1712 tp->t_maxseg = mss;
1713
1714 tp->snd_cwnd = mss;
1715
1716 sbreserve(&so->so_snd, tcp_sndspace+((tcp_sndspace%mss)?(mss-(tcp_sndspace%mss)):0));
1717 sbreserve(&so->so_rcv, tcp_rcvspace+((tcp_rcvspace%mss)?(mss-(tcp_rcvspace%mss)):0));
1718
1719 DEBUG_MISC((dfd, " returning mss = %d\n", mss));
1720
1721 return mss;
1722 }