Misbehaviors in TCP SACK Generation

By: 
Nasif Ekiz, Abuthahir Habeeb Rahman, and Paul D. Amer
Appears in: 
CCR April 2011

While analyzing CAIDA Internet traces of TCP traffic to detect instances of data reneging, we frequently observed seven misbehaviors in the generation of SACKs. These misbehaviors could result in a data sender mistakenly thinking data reneging occurred. With one misbehavior, the worst case could result in a data sender receiving a SACK for data that was transmitted but never received. This paper presents a methodology and its application to test a wide range of operating systems using TBIT to fingerprint which ones misbehave in each of the seven ways. Measuring the performance loss due to these misbehaviors is outside the scope of this study; the goal is to document the misbehaviors so they may be corrected. One can conclude that the handling of SACKs while simple in concept is complex to implement.

Public Review By: 
S. Saroiu

This paper presents a study of eight misbehaviors in TCP SACK generation in TCP/IP stacks in commodity operating systems. The misbehaviors were first identified in a study of CAIDA TCP traffic traces that the authors conducted in previous work. After describing the misbehaviors, the paper uses TBIT to implement a number of tests to detect whether these misbehaviors appear in Windows, Linux, Mac OSX, FreeBSD, OpenBSD, Solaris, and OpenSolaris. The bad news is that most operating systems exhibit one or more of these misbehaviors. The good news is that newer versions of each OS had less misbehaviors than older ones.
Seven out of the eight misbehaviors discussed do not affect correctness, and the eight one manifests only in Solaris and OpenSolaris. In fact, as all reviewers pointed out, most of these misbehaviors are due to the lack of (or incorrect) implementation of SHOULD clauses in several RFCs. For example, in one example, a host sends fewer TCP SACK blocks then it could; in another one, a host does not include SACK information in FIN segments. The most serious misbehavior is due to SACK information from earlier connections appearing in later ones (definitely a bug).
The paper is interesting in two ways. First, it is informative about the state of TCP SACK generation in TCP/IP stacks today. Second, it is a great example of how complex algorithms are incredibly hard to “get right” in practice. Personally, I’d even consider using this paper in an undergraduate systems and networking design course to illustrate the effects of complex designs in practice. Despite being over a decade old, TCP SACK implementations are still incomplete in today’s commodity operating systems.