CCR Papers from january 2008

  • Xenofontas Dimitropoulos, Paul Hurley, and Andreas Kind

    Knowledge of the largest traffic flows in a network is important for many network management applications. The problem of finding these flows is known as the heavy-hitter problem and has been the subject of many studies in the past years. One of the most efficient and well-known algorithms for finding heavy hitters is lossy counting [29].

    In this work we introduce probabilistic lossy counting (PLC), which enhances lossy counting in computing network traffic heavy hitters. PLC uses on a tighter error bound on the estimated sizes of traffic flows and provides probabilistic rather than deterministic guarantees on its accuracy. The probabilistic-based error bound substantially improves the memory consumption of the algorithm. In addition, PLC reduces the rate of false positives of lossy counting and achieves a low estimation error, although slightly higher than that of lossy counting.

    We compare PLC with state-of-the-art algorithms for finding heavy hitters. Our experiments using real traffic traces find that PLC has 1) between 34.4% and 74% lower memory consumption, 2) between 37.9% and 40.5% fewer false positives than lossy counting, and 3) a small estimation error.

    Pablo Rodriguez
  • Fang-Chun Kuo and Xiaoming Fu

    An aggregate congestion control mechanism, namely Probe- Aided MulTCP (PA-MulTCP), is proposed in this paper. It is based on MulTCP, a proposal for enabling an aggregate to emulate the behavior of multiple concurrent TCP connections. The objective of PA-MulTCP is to ensure the fair sharing of the bottleneck bandwidth between the aggregate and other TCP or TCP-friendly flows while keeping lightweightness and responsiveness. Unlike MulTCP, there are two congestion window loops in PA-MulTCP, namely the probe window loop and the adjusting window loop. The probe window loop constantly probes the congestion situation and the adjusting window loop dynamically adjusts the congestion window size for the arriving and departing flows within the aggregate. Our simulations demonstrate that PA-MulTCP is more stable and fairer than MulTCP over a wide range of the weight N in steady conditions as well as in varying congestion conditions. PA-MulTCP is also responsive to flow arrival/departure and thus reduces the latency of short-lived transfers. Furthermore, PA-MulTCP is lightweight, since it enjoys above advantages at the cost of only an extra probe window loop, which has a marginal influence on the implementation complexity. Finally, the design of PA-MulTCP decouples the congestion management from the other functionalities in the aggregate flow management. As a result, PA-MulTCP could be potentially applied to a wider range of scenarios, e.g. wireless TCP proxies, edge-to-edge overlays, QoS provisioning and mass data transport.

    Jon Crowcroft
  • Constantine Dovrolis

    As significant resources are directed towards clean-slate networking research, it is imperative to understand how cleanslate architectural research compares to the diametrically opposite paradigm of evolutionary research. This paper approaches the “evolution versus clean-slate” debate through a biological metaphor. We argue that evolutionary research can lead to less costly (more competitive) and more robust designs than clean-slate architectural research. We also argue that the Internet architecture is not ossified, as recently claimed, but that its core protocols play the role of “evolutionary kernels”, meaning that they are conserved so that complexity and diversity can emerge at the lower and higher layers. We then discuss the factors that determine the deployment of new architectures or protocols, and argue, based on the notion of “auto-catalytic sets”, that successful innovations are those that become synergistic components in closed loops of existing modules. The paper closes emphasizing the role of evolutionary Internet research.

  • Haakon Ringberg, Augustin Soule, and Jennifer Rexford

    Despite the flurry of anomaly-detection papers in recent years, effective ways to validate and compare proposed solutions have remained elusive. We argue that evaluating anomaly detectors on manually labeled traces is both important and unavoidable. In particular, it is important to evaluate detectors on traces from operational networks because it is in this setting that the detectors must ultimately succeed. In addition, manual labeling of such traces is unavoidable because new anomalies will be identified and characterized from manual inspection long before there are realistic models for them. It is well known, however, that manual labeling is slow and error-prone. In order to mitigate these challenges, we presentWebClass, a web-based infrastructure that adds rigor to the manual labeling process. WebClass allows researchers to share, inspect, and label traffic timeseries through a common graphical user interface. We are releasing WebClass to the research community in the hope that it will foster greater collaboration in creating labeled traces and that the traces will be of higher quality because the entire community has access to all the information that led to a given label.

  • Guillaume Urvoy-Keller, Taoufik En-Najjary, and Alessandro Sorniotti

    The available bandwidth of a path directly impacts the performance of throughput sensitive applications, e.g., p2p content replication or podcasting. Several tools have been devised to estimate the available bandwidth. The vast majority of these tools follow either the Probe Rate Model (PRM) or the Probe Gap Model (PGM).

    Lao et al. [6] and Liu et al. [7] have identified biases in the PGM approach that lead to consistent underestimations of the available bandwidth. Those results were obtained under the ideal assumption of stationary cross traffic.

    In this note, we confirm the existence of these biases experimentally, i.e., for the case of non stationary cross traffic. To do so, we compare one representative of the PRM family, namely Pathload, and one representative of the PGM family, namely Spruce, using long term (several day long) traces collected on an example path.

    We first propose a methodology to compare operational results of two available bandwidth measurement tools. Based on the sanitized data obtained using the previous method- ology, we next show that the biases identified by previous works are clearly observable on the long term, even with non stationary cross traffic. We further uncover the formal link that exists between the work by Liu et al. and the one by Lao et al.

  • Tristan Henderson

    The networking research community lacks a tradition of sharing experimental data, or using such data for reproducing results. But are we really that bad? Are we worse than researchers in other fields? And if so, how can we do better?

  • Jon Crowcroft

    A major contribution to global warming has been the number of new workshops publishing proceedings with the prefix hot. In this article, I propose that we counter this trend in an attempt to remain carbon neutral with a set of “anti-workshops” on cold topics.

    We suggest a number of heuristics for detecting when a topic has gone cold, and give some examples of the application of these heuristics. Of course, some cold topics warm up again, and so there is a risk of over dampening in our heuristic. Over a long period, dynamic equilibrium should be assured, but from time to time, our scheme may prejudice against surprising results in boring areas of communications research. Nevertheless, it may leave room for more surprising results in interesting areas of research, which cannot be a bad thing.

  • Nikolaos Laoutaris, Pablo Rodriguez, and Laurent Massoulie

    In this paper we propose a radical solution to data hosting and delivery for the Internet of the future. The current data delivery architecture is “network centric”, with content stored in data centers connected directly to Internet backbones. This approach has multiple drawbacks among which complexity of deploying data centers, power consumption, and lack of scalability are the most critical ones. We propose a totally innovative and orthogonal approach to traditional data centers, through what we call “nano” data centers, which are essentially boxes deployed at the edge of the network (e.g., in home gateways, set-top-boxes, etc.) that cooperate in a peer-to-peer manner. Unlike traditional peer-to-peer clients, however, our nano data centers operate under a common management authority, e.g., the ISP who installs and maintains the set-top-boxes, and can thus cooperate more effectively and achieve a higher aggregate performance. Nano data centers are, therefore, better suited for providing guaranteed quality to new emerging applications such as online gaming, interactive IPTV and VoD, and user generated content.

  • Haakon Ringberg, Matthew Roughan, and Jennifer Rexford

    Anomalous events that affect the performance of networks are a fact of life. It is therefore not surprising that recent years have seen an explosion in research on network anomaly detection. What is quite surprising, however, is the lack of controlled evaluation of these detectors. In this paper we argue that there are numerous important questions regarding the effectiveness of anomaly detectors that cannot be answered by the evaluation techniques employed today. We present four central requirements of a rigorous evaluation that can only be met by simulating both the anomaly and its surrounding environment. While simulation is necessary, it is not sufficient. We therefore present an outline of an evaluation methodology that leverages both simulation and traces from operational networks.

Syndicate content