Computer Communication Review: Papers

Find a CCR issue:
  • George Varghese

    The most striking ideas in systems are abstractions such as virtual memory, sockets, or packet scheduling. Algorithmics is the servant of abstraction, allowing the performance of the system to approach that of the underlying hardware. I survey the trajectory of network algorithmics, starting with a focus on speed and scale in the 1990s to measurement and security in the 2000s, using what I call the confluence lens. Confluence sees interdisciplinary work as a merger of two or more disciplines made compelling by an inflection point in the real world, while also producing genuinely transformed ideas. I attempt to show that Network Algorithmics represented a confluence in the 1990s between computer systems, algorithms, and networking. I suggest Confluence Diagrams as a means to identify future interdisciplinary opportunities, and describe the emerging field of Network Verification as a new confluence between networking and programming languages.

  • Aditya Akella

    Dear networking students everywhere, Welcome to the first edition of a new quarterly column aimed at mentoring students in the ACM SIGCOMM community. I’m honored to be the inaugural editor. The primary objective of this column is to offer students advice on general issues pertaining to networking research, teaching, and careers. Most students have excellent advisors, but my hope is that this column will nevertheless help, e.g., by: (a) augmenting advice students get from their advisors, and (b) aiding graduate/undergraduate students who are exploring moving into networking or to a new sub-topic therein. Here are some examples of questions this column is suitable for. This is not an exhaustive list; there are many other relevant issues that are not listed here. • Research methods: “Is there an optimal way to approach networked systems research? What are some things to keep in mind when embarking on active measurement projects?” • Tools, testbeds, datasets: “Where can I do experiments to test my new cool idea for data center networking”? “Are there datasets that will aid me in my research on topic X?” • Time management ahead of conference deadlines: “How do I balance writing vs. hacking/experiments?” • Career advice: “With so much exciting work happening in the industry, is there much point in seeking an academic job? Is a PhD in networking worth it?” • Course choice: “Are there courses I should take to prepare myself for research in topic X? Are there online materials I can use for this?” • Teaching networking: “I’m going to teach my first networking course in the near future. How do I prepare myself? Are there online resources I could use?” • Getting involved: “How do I become a more integral part of the SIGCOMM community?” I will likely not address each individual query; rather my goal is to collate (a subset of) the questions I get, group them into meaningful topics, and offer advice as best as I can. Of course, I am not an expert on many of these topics. Thus, I may seek the opinion of relevant people in the community in responding to specific queries. This column is likely to be just 1-2 pages long. As such, some answers may be brief; e.g., I may simply point folks to online resources that already offer the relevant advice. Note that this is not an advice column of the sort you may find in magazines and Sunday newspapers. I will not address topics that are personal in nature; e.g., if you’re having an issue with your advisor I’m not the one to approach for resolution! I will also not address questions comparing venues (“Is X a better conference than Y?”) or research topics (“Is X a better topic to work on than Y?” or “Is X dead?”). And don’t send me abstracts/papers for feedback! Interested students can send queries to Names of students asking questions will not be published by default, unless students request otherwise. If you’re not a student, and you have feedback on or disagreement with a statement or comment in my column, I’d love to hear from you as well. Looking forward to hearing from you all.

  • Dina Papagiannaki

    Welcome to the October issue of CCR. This issue features the technical and editorial papers that comprise CCR's quarterly content, but also all the papers that appeared at ACM Sigcomm, along with the best papers selected from its affiliated workshops. Sigcomm this year was attended by 733 people, making it the second highest attendance event ACM Sigcomm has ever had, after Hong Kong last year. More interestingly, 23% of all attendees were coming from industry, something which I also noticed during the presentations and the follow up questions. Our conference has become a very vibrant venue for an expanding community. A community, that seems increasingly interested in solving difficult scientific problems while having impact on the technology landscape as a whole. I am really looking forward to the results of such closer collaboration between academic and industrial researchers. The SIGCOMM award was presented to Dr. George Varghese, “for his sustained and diverse contributions to network algorithmics, with far reaching impact in both research and industry.” George’s talk was focused on his framework of structuring his research, such that he indeed solves difficult problems that will change the landscape of technology in a fundamental way. I surely hope to secure an editorial submission from him in a future issue of CCR, where he can describe in his own "written" words his thinking around what he calls "confluence." The main content of the conference revolved around software defined networks, data centers, wireless and cellular networks, security and privacy. Security and trust are also the topics of our three CCR technical papers for this issue. The editorial section comprises the report from the 6th workshop on active Internet measurements, a position paper aiming to define “fog computing,” and the introduction of an interesting open platform for cellular research, OpenAirInterface. I hope you enjoy reading them. Finally, this issue is marking the end of tenure for two of our editors, Dr. Renata Teixeira, and Dr. Sanjay Jha. I wanted to thank both of them for their service to CCR. With that, I hope you enjoy this extended issue of CCR and I am always at your disposal in case of questions/suggestions. Dina Papagiannaki CCR Editor

  • S. Coull, K. Dyer

    Instant messaging services are quickly becoming the most dominant form of communication among consumers around the world. Apple iMessage, for example, handles over 2 billion messages each day, while WhatsApp claims 16 billion messages from 400 million international users. To protect user privacy, many of these services typically implement endto-end and transport layer encryption, which are meant to make eavesdropping infeasible even for the service providers themselves. In this paper, however, we show that it is possible for an eavesdropper to learn information about user actions, the language of messages, and even the length of those messages with greater than 96% accuracy despite the use of state-of-the-art encryption technologies simply by observing the sizes of encrypted packets. While our evaluation focuses on Apple iMessage, the attacks are completely generic and we show how they can be applied to many popular messaging services, including WhatsApp, Viber, and Telegram.

    Joel Sommers
  • C. Ghali, G. Tsudik, E. Uzun

    In contrast to today’s IP-based host-oriented Internet architecture, Information-Centric Networking (ICN) emphasizes content by making it directly addressable and routable. Named Data Networking (NDN) architecture is an instance of ICN that is being developed as a candidate next-generation Internet architecture. By opportunistically caching content within the network, NDN appears to be well-suited for large-scale content distribution and for meeting the needs of increasingly mobile and bandwidth-hungry applications that dominate today’s Internet. One key feature of NDN is the requirement for each content object to be digitally signed by its producer. Thus, NDN should be, in principle, immune to distributing fake (aka "poisoned") content. However, in practice, this poses two challenges for detecting fake content in NDN routers: (1) overhead due to signature verification and certificate chain traversal, and (2) lack of trust context, i.e., determining which public keys are trusted to verify which content. Because of these issues, NDN does not force routers to verify content signatures, which makes the architecture susceptible to content poisoning attacks. This paper explores root causes of, and some cures for, content poisoning attacks in NDN. In the process, it becomes apparent that meaningful mitigation of content poisoning is contingent upon a network-layer trust management architecture, elements of which we construct, while carefully justifying specific design choices. This work represents the initial effort towards comprehensive trust management for NDN.

    Phillipa Gill
  • R. Hofstede, L. Hendriks, A. Sperotto, A. Pras

    Flow-based approaches for SSH intrusion detection have been developed to overcome the scalability issues of host-based alternatives. Although the detection of many SSH attacks in a flow-based fashion is fairly straightforward, no insight is typically provided in whether an attack was successful. We address this shortcoming by presenting a detection algorithm for the flow-based detection of compromises, i.e., hosts that have been compromised during an attack. Our algorithm has been implemented as part of our open-source IDS SSHCure and validated using almost 100 servers, workstations and honeypots, featuring an accuracy close to 100%.

    Hitesh Ballani
  • L. Vaquero, L. Rodero-Merino

    The cloud is migrating to the edge of the network, where routers themselves may become the virtualisation infrastructure, in an evolution labelled as “the fog”. However, many other complementary technologies are reaching a high level of maturity. Their interplay may dramatically shift the information and communication technology landscape in the following years, bringing separate technologies into a common ground. This paper offers a comprehensive definition of the fog, comprehending technologies as diverse as cloud, sensor networks, peer-to-peer networks, network virtualisation functions or configuration management techniques. We highlight the main challenges faced by this potentially breakthrough technology amalgamation.

  • N. Nikaein, M. Marina, S. Manickam, A. Dawson, R. Knopp, C. Bonnet

    Driven by the need to cope with exponentially growing mobile data traffic and to support new traffic types from massive numbers of machine-type devices, academia and industry are thinking beyond the current generation of mobile cellular networks to chalk a path towards fifth generation (5G) mobile networks. Several new approaches and technologies are being considered as potential elements making up such a future mobile network, including cloud RANs, application of SDN principles, exploiting new and unused portions of spectrum, use of massive MIMO and full-duplex communications. Research on these technologies requires realistic and flexible experimentation platforms that offer a wide range of experimentation modes from real-world experimentation to controlled and scalable evaluations while at the same time retaining backward compatibility with current generation systems. Towards this end, we present OpenAirInterface (OAI) as a suitably flexible platform. In addition, we discuss the use of OAI in the context of several widely mentioned 5G research directions.

  • kc claffy

    On 26-27 March 2014, CAIDA hosted the sixth Workshop on Active Internet Measurements (AIMS-6) as part of our series of Internet Statistics and Metrics Analysis (ISMA) workshops. As with previous AIMS workshops, the goals were to further our understanding of the potential and limitations of active measurement research and infrastructure in the wide-area Internet, and to promote cooperative solutions and coordinated strategies between academics, industry, policymakers, and funding agencies in the area of active Internet measurement. This year, we explored capabilities and opportunities for network measurement in the wireless domain, and research infrastructure to support it. Participants found the workshop content challengingly diverse, with substantial knowledge exchange regarding the wireless research infrastructure landscape(s) and existing measurement capabilities. But attendees agreed that the conversation was only beginning, and that some challenges merit further discussion, such as finding consensus on standard metrics to measure, and constructing a road map for wireless measurement research infrastructure and activities for the next decade. This report describes topics discussed at the workshop, and summarizes participants’ views of priorities for future funding as well as follow-on workshops in this area. Materials related to the workshop are available at

  • Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat

    Software-defined networks can enable a variety of concurrent, dynamically instantiated, measurement tasks, that provide fine-grain visibility into network traffic. Recently, there have been many proposals to configure TCAM counters in hardware switches to monitor traffic. However, the TCAM memory at switches is fundamentally limited and the accuracy of the measurement tasks is a function of the resources devoted to them on each switch. This paper describes an adaptive measurement framework, called DREAM, that dynamically adjusts the resources devoted to each measurement task, while ensuring a user-specified level of accuracy. Since the trade-off between resource usage and accuracy can depend upon the type of tasks, their parameters, and traffic characteristics, DREAM does not assume an a priori characterization of this trade-off, but instead dynamically searches for a resource allocation that is sufficient to achieve a desired level of accuracy. A prototype implementation and simulations with three network-wide measurement tasks (heavy hitter, hierarchical heavy hitter and change detection) and diverse traffic show that DREAM can support more concurrent tasks with higher accuracy than several other alternatives.

  • Vimalkumar Jeyakumar, Mohammad Alizadeh, Yilong Geng, Changhoon Kim, David Mazières

    This paper presents a practical approach to rapidly introducing new dataplane functionality into networks: End-hosts embed tiny programs into packets to actively query and manipulate a network’s internal state. We show how this “tiny packet program” (TPP) interface gives end-hosts unprecedented visibility into network behavior, enabling them to work with the network to achieve a desired functionality. Our design leverages what each component does best: (a) switches forward and execute tiny packet programs (at most 5 instructions) in-band at line rate, and (b) end-hosts perform arbitrary (and easily updated) computation on network state. By implementing three different research proposals, we show that TPPs are useful. Using a hardware prototype on a NetFPGA, we show our design is feasible at a reasonable cost.

  • Ethan Heilman, Danny Cooper, Leonid Reyzin, Sharon Goldberg

    The Resource Public Key Infrastructure (RPKI) is a new infrastructure that prevents some of the most devastating attacks on interdomain routing. However, the security benefits provided by the RPKI are accomplished via an architecture that empowers centralized authorities to unilaterally revoke any IP prefixes under their control. We propose mechanisms to improve the transparency of the RPKI, in order to mitigate the risk that it will be used for IP address takedowns. First, we present tools that detect and visualize changes to the RPKI that can potentially take down an IP prefix. We use our tools to identify errors and revocations in the production RPKI. Next, we propose modifications to the RPKI’s architecture to (1) require any revocation of IP address space to receive consent from all impacted parties, and (2) detect when misbehaving authorities fail to obtain consent. We present a security analysis of our architecture, and estimate its overhead using data-driven analysis.

  • Kirill Kogan, Sergey Nikolenko, Ori Rottenstreich, William Culhane, Patrick Eugster

    Efficient packet classification is a core concern for network services. Traditional multi-field classification approaches, in both software and ternary content-addressable memory (TCAMs), entail tradeoffs between (memory) space and (lookup) time. TCAMs cannot efficiently represent range rules, a common class of classification rules confining values of packet fields to given ranges. The exponential space growth of TCAM entries relative to the number of fields is exacerbated when multiple fields contain ranges. In this work, we present a novel approach which identifies properties of many classifiers which can be implemented in linear space and with worst-case guaranteed logarithmic time and allows the addition of more fields including range constraints without impacting space and time complexities. On real-life classifiers from Cisco Systems and additional classifiers from ClassBench [7] (with real parameters), 90-95% of rules are thus handled, and the other 510% of rules can be stored in TCAM to be processed in parallel.

  • Jakub Czyz, Mark Allman, Jing Zhang, Scott Iekel-Johnson, Eric Osterweil, Michael Bailey

    After several IPv4 address exhaustion milestones in the last three years, it is becoming apparent that the world is running out of IPv4 addresses, and the adoption of the next generation Internet protocol, IPv6, though nascent, is accelerating. In order to better understand this unique and disruptive transition, we explore twelve metrics using ten global-scale datasets to create the longest and broadest measurement of IPv6 adoption to date. Using this perspective, we find that adoption, relative to IPv4, varies by two orders of magnitude depending on the measure examined and that care must be taken when evaluating adoption metrics in isolation. Further, we find that regional adoption is not uniform. Finally, and perhaps most surprisingly, we find that over the last three years, the nature of IPv6 utilization—in terms of traffic, content, reliance on transition technology, and performance—has shifted dramatically from prior findings, indicating a maturing of the protocol into production mode. We believe IPv6’s recent growth and this changing utilization signal a true quantum leap.

  • Te-Yuan Huang, Ramesh Johari, Nick McKeown, Matthew Trunnell, Mark Watson

    Existing ABR algorithms face a significant challenge in estimating future capacity: capacity can vary widely over time, a phenomenon commonly observed in commercial services. In this work, we suggest an alternative approach: rather than presuming that capacity estimation is required, it is perhaps better to begin by using only the buffer, and then ask when capacity estimation is needed. We test the viability of this approach through a series of experiments spanning millions of real users in a commercial service. We start with a simple design which directly chooses the video rate based on the current buffer occupancy. Our own investigation reveals that capacity estimation is unnecessary in steady state; however using simple capacity estimation (based on immediate past throughput) is important during the startup phase, when the buffer itself is growing from empty. This approach allows us to reduce the rebuffer rate by 10–20% compared to Netflix’s then-default ABR algorithm, while delivering a similar average video rate, and a higher video rate in steady state.

  • Tong Yang, Gaogang Xie, YanBiao Li, Qiaobin Fu, Alex X. Liu, Qi Li, Laurent Mathy

    The Forwarding Information Base (FIB) of backbone routers has been rapidly growing in size. An ideal IP lookup algorithm should achieve constant, yet small, IP lookup time and on-chip memory usage. However, no prior IP lookup algorithm achieves both requirements at the same time. In this paper, we first propose SAIL, a Splitting Approach to IP Lookup. One splitting is along the dimension of the lookup process, namely finding the prefix length and finding the next hop, and another splitting is along the dimension of prefix length, namely IP lookup on prefixes of length less than or equal to 24 and IP lookup on prefixes of length longer than 24. Second, we propose a suite of algorithms for IP lookup based on our SAIL framework. Third, we implemented our algorithms on four platforms: CPU, FPGA, GPU, and many-core. We conducted extensive experiments to evaluate our algorithms using real FIBs and real traffic from a major ISP in China. Experimental results show that our SAIL algorithms are several times or even two orders of magnitude faster than well known IP lookup algorithms.

  • Peng Sun, Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin

    We present Statesman, a network-state management service that allows multiple network management applications to operate independently, while maintaining network-wide safety and performance invariants. Network state captures various aspects of the network such as which links are alive and how switches are forwarding traffic. Statesman uses three views of the network state. In observed state, it maintains an up-to-date view of the actual network state. Applications read this state and propose state changes based on their individual goals. Using a model of dependencies among state variables, Statesman merges these proposed states into a target state that is guaranteed to maintain the safety and performance invariants. It then updates the network to the target state. Statesman has been deployed in ten Microsoft Azure datacenters for several months, and three distinct applications have been built on it. We use the experience from this deployment to demonstrate how Statesman enables each application to meet its goals, while maintaining network-wide invariants.

  • Anuj Kalia, Michael Kaminsky, David G. Andersen

    This paper describes the design and implementation of HERD, a keyvalue system designed to make the best use of an RDMA network. Unlike prior RDMA-based key-value systems, HERD focuses its design on reducing network round trips while using efficient RDMA primitives; the result is substantially lower latency, and throughput that saturates modern, commodity RDMA hardware. HERD has two unconventional decisions: First, it does not use RDMA reads, despite the allure of operations that bypass the remote CPU entirely. Second, it uses a mix of RDMA and messaging verbs, despite the conventional wisdom that the messaging primitives are slow. A HERD client writes its request into the server’s memory; the server computes the reply. This design uses a single round trip for all requests and supports up to 26 million key-value operations per second with 5 µs average latency. Notably, for small key-value items, our full system throughput is similar to native RDMA read throughput and is over 2X higher than recent RDMA-based keyvalue systems. We believe that HERD further serves as an effective template for the construction of RDMA-based datacenter services.

  • Arpit Gupta, Laurent Vanbever, Muhammad Shahbaz, Sean P. Donovan, Brandon Schlinker, Nick Feamster, Jennifer Rexford, Scott Shenker, Russ Clark, Ethan Katz-Bassett

    BGP severely constrains how networks can deliver traffic over the Internet. Today’s networks can only forward traffic based on the destination IP prefix, by selecting among routes offered by their immediate neighbors. We believe Software Defined Networking (SDN) could revolutionize wide-area traffic delivery, by offering direct control over packet-processing rules that match on multiple header fields and perform a variety of actions. Internet exchange points (IXPs) are a compelling place to start, given their central role in interconnecting many networks and their growing importance in bringing popular content closer to end users. To realize a Software Defined IXP (an “SDX”), we must create compelling applications, such as “application-specific peering”— where two networks peer only for (say) streaming video traffic. We also need new programming abstractions that allow participating networks to create and run these applications and a runtime that both behaves correctly when interacting with BGP and ensures that applications do not interfere with each other. Finally, we must ensure that the system scales, both in rule-table size and computational overhead. In this paper, we tackle these challenges and demonstrate the flexibility and scalability of our solutions through controlled and in-the-wild experiments. Our experiments demonstrate that our SDX implementation can implement representative policies for hundreds of participants who advertise full routing tables while achieving sub-second convergence in response to configuration changes and routing updates.

  • Konstantinos Nikitopoulos, Juan Zhou, Ben Congdon, Kyle Jamieson

    This paper presents the design and implementation of Geosphere, a physical- and link-layer design for access point-based MIMO wireless networks that consistently improves network throughput. To send multiple streams of data in a MIMO system, prior designs rely on a technique called zero-forcing, a way of “nulling” the interference between data streams by mathematically inverting the wireless channel matrix. In general, zero-forcing is highly effective, significantly improving throughput. But in certain physical situations, the MIMO channel matrix can become “poorly conditioned,” harming performance. With these situations in mind, Geosphere uses sphere decoding, a more computationally demanding technique that can achieve higher throughput in such channels. To overcome the sphere decoder’s computational complexity when sending dense wireless constellations at a high rate, Geosphere introduces search and pruning techniques that incorporate novel geometric reasoning about the wireless constellation. These techniques reduce computational complexity of 256-QAM systems by almost one order of magnitude, bringing computational demands in line with current 16- and 64-QAM systems already realized in ASIC. Geosphere thus makes the sphere decoder practical for the first time in a 4 × 4 MIMO, 256-QAM system. Results from our WARP testbed show that Geosphere achieves throughput gains over multi-user MIMO of 2× in 4 × 4 systems and 47% in 2 × 2 MIMO systems.

  • Guan-Hua Tu, Yuanjie Li, Chunyi Peng, Chi-Yu Li, Hongyi Wang, Songwu Lu

    Control-plane protocols are complex in cellular networks. They communicate with one another along three dimensions of cross layers, cross (circuit-switched and packet-switched) domains, and cross (3G and 4G) systems. In this work, we propose signaling diagnosis tools and uncover six instances of problematic interactions. Such control-plane issues span both design defects in the 3GPP standards and operational slips by carriers. They are more damaging than data-plane failures. In the worst-case scenario, users may be out of service in 4G, or get stuck in 3G. We deduce root causes, propose solutions, and summarize learned lessons.

  • Colin Scott, Andreas Wundsam, Barath Raghavan, Aurojit Panda, Andrew Or, Jefferson Lai, Eugene Huang, Zhi Liu, Ahmed El-Hassany, Sam Whitlock, H.B. Acharya, Kyriakos Zarifis, Scott Shenker

    Software bugs are inevitable in software-defined networking control software, and troubleshooting is a tedious, time-consuming task. In this paper we discuss how to improve control software troubleshooting by presenting a technique for automatically identifying a minimal sequence of inputs responsible for triggering a given bug, without making assumptions about the language or instrumentation of the software under test. We apply our technique to five open source SDN control platforms—Floodlight, NOX, POX, Pyretic, ONOS—and illustrate how the minimal causal sequences our system found aided the troubleshooting process.

  • Ali Munir, Ghufran Baig, Syed M. Irteza, Ihsan A. Qazi, Alex X. Liu, Fahad R. Dogar

    Many data center transports have been proposed in recent times (e.g., DCTCP, PDQ, pFabric, etc). Contrary to the common perception that they are competitors (i.e., protocol A vs. protocol B), we claim that the underlying strategies used in these protocols are, in fact, complementary. Based on this insight, we design PASE, a transport framework that synthesizes existing transport strategies, namely, self-adjusting endpoints (used in TCP style protocols), innetwork prioritization (used in pFabric), and arbitration (used in PDQ). PASE is deployment friendly: it does not require any changes to the network fabric; yet, its performance is comparable to, or better than, the state-of-the-art protocols that require changes to network elements (e.g., pFabric). We evaluate PASE using simulations and testbed experiments. Our results show that PASE performs well for a wide range of application workloads and network settings.

  • David Naylor, Matthew K. Mukerjee, Peter Steenkiste

    Though most would agree that accountability and privacy are both valuable, today’s Internet provides little support for either. Previous efforts have explored ways to offer stronger guarantees for one of the two, typically at the expense of the other; indeed, at first glance accountability and privacy appear mutually exclusive. At the center of the tussle is the source address: in an accountable Internet, source addresses undeniably link packets and senders so hosts can be punished for bad behavior. In a privacy-preserving Internet, source addresses are hidden as much as possible. In this paper, we argue that a balance is possible. We introduce the Accountable and Private Internet Protocol (APIP), which splits source addresses into two separate fields — an accountability address and a return address — and introduces independent mechanisms for managing each. Accountability addresses, rather than pointing to hosts, point to accountability delegates, which agree to vouch for packets on their clients’ behalves, taking appropriate action when misbehavior is reported. With accountability handled by delegates, senders are now free to mask their return addresses; we discuss a few techniques for doing so.

  • Xin Jin, Hongqiang Harry Liu, Rohan Gandhi, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Jennifer Rexford, Roger Wattenhofer

    We present Dionysus, a system for fast, consistent network updates in software-defined networks. Dionysus encodes as a graph the consistency-related dependencies among updates at individual switches, and it then dynamically schedules these updates based on runtime differences in the update speeds of different switches. This dynamic scheduling is the key to its speed; prior update methods are slow because they pre-determine a schedule, which does not adapt to runtime conditions. Testbed experiments and data-driven simulations show that Dionysus improves the median update speed by 53–88% in both wide area and data center networks compared to prior methods.

  • Zhiyong Zhang, Ovidiu Mara, Katerina Argyraki

    When can we reason about the neutrality of a network based on external observations? We prove conditions under which it is possible to (a) detect neutrality violations and (b) localize them to specific links, based on external observations. Our insight is that, when we make external observations from different vantage points, these will most likely be inconsistent with each other if the network is not neutral. Where existing tomographic techniques try to form solvable systems of equations to infer network properties, we try to form unsolvable systems that reveal neutrality violations. We present an algorithm that relies on this idea to identify sets of nonneutral links based on external observations, and we show, through network emulation, that it achieves good accuracy for a variety of network conditions.

  • Jonathan Perry, Amy Ousterhout, Hari Balakrishnan, Devavrat Shah, Hans Fugal

    An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate control—to a centralized arbiter—of when each packet should be transmitted and what path it should follow. This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the first determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efficient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebook’s datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240× reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent flow throughputs than the baseline TCP (5200× reduction in the standard deviation of per-flow throughput with five concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of traffic in software on eight cores, and a 2.5× reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.

  • Jeff Rasley, Brent Stephens, Colin Dixon, Eric Rozner, Wes Felter, Kanak Agarwal, John Carter, Rodrigo Fonseca

    Software-defined networking introduces the possibility of building self-tuning networks that constantly monitor network conditions and react rapidly to important events such as congestion. Unfortunately, state-of-the-art monitoring mechanisms for conventional networks require hundreds of milliseconds to seconds to extract global network state, like link utilization or the identity of “elephant” flows. Such latencies are adequate for responding to persistent issues, e.g., link failures or long-lasting congestion, but are inadequate for responding to transient problems, e.g., congestion induced by bursty workloads sharing a link. In this paper, we present Planck, a novel network measurement architecture that employs oversubscribed port mirroring to extract network information at 280 µs–7 ms timescales on a 1 Gbps commodity switch and 275 µs–4 ms timescales on a 10 Gbps commodity switch, over 11x and 18x faster than recent approaches, respectively (and up to 291x if switch firmware allowed buffering to be disabled on some ports). To demonstrate the value of Planck’s speed and accuracy, we use it to drive a traffic engineering application that can reroute congested flows in milliseconds. On a 10 Gbps commodity switch, Planck-driven traffic engineering achieves aggregate throughput within 1–4% of optimal for most workloads we evaluated, even with flows as small as 50 MiB, an improvement of up to 53% over previous schemes.

  • Ilias Marinos, Robert N.M. Watson, Mark Handley
  • Aaron N. Parks, Angli Liu, Shyamnath Gollakota, Joshua R. Smith

    Communication primitives such as coding and multiple antenna processing have provided significant benefits for traditional wireless systems. Existing designs, however, consume significant power and computational resources, and hence cannot be run on low complexity, power constrained backscatter devices. This paper makes two main contributions: (1) we introduce the first multi-antenna cancellation design that operates on backscatter devices while retaining a small form factor and power footprint, (2) we introduce a novel coding mechanism that enables long range communication as well as concurrent transmissions and can be decoded on backscatter devices. We build hardware prototypes of the above designs that can be powered solely using harvested energy from TV and solar sources. The results show that our designs provide benefits for both RFID and ambient backscatter systems: they enable RFID tags to communicate directly with each other at distances of tens of meters and through multiple walls. They also increase the communication rate and range achieved by ambient backscatter systems by 100X and 40X respectively. We believe that this paper represents a substantial leap in the capabilities of backscatter communication.

  • Aaron Gember-Jacobson, Raajay Viswanathan, Chaithan Prakash, Robert Grandl, Junaid Khalid, Sourav Das, Aditya Akella

    Network functions virtualization (NFV) together with softwaredefined networking (SDN) has the potential to help operators satisfy tight service level agreements, accurately monitor and manipulate network traffic, and minimize operating expenses. However, in scenarios that require packet processing to be redistributed across a collection of network function (NF) instances, simultaneously achieving all three goals requires a framework that provides efficient, coordinated control of both internal NF state and network forwarding state. To this end, we design a control plane called OpenNF. We use carefully designed APIs and a clever combination of events and forwarding updates to address race conditions, bound overhead, and accommodate a variety of NFs. Our evaluation shows that OpenNF offers efficient state control without compromising flexibility, and requires modest additions to NFs.

  • Hongqiang Harry Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang, David Gelernter

    Network faults such as link failures and high switch configuration delays can cause heavy congestion and packet loss. Because it takes time for the traffic engineering systems to detect and react to such faults, these conditions can last long—even tens of seconds. We propose forward fault correction (FFC), a proactive approach for handling faults. FFC spreads network traffic such that freedom from congestion is guaranteed under arbitrary combinations of up to k faults. We show how FFC can be practically realized by compactly encoding the constraints that arise from this large number of possible faults and solving them efficiently using sorting networks. Experiments with data from real networks show that, with negligible loss in overall network throughput, FFC can reduce data loss by a factor of 7–130 in well-provisioned networks, and reduce the loss of high-priority traffic to almost zero in well-utilized networks.

  • Mosharaf Chowdhury, Yuan Zhong, Ion Stoica

    Communication in data-parallel applications often involves a collection of parallel flows. Traditional techniques to optimize flowlevel metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-level requirements. The recently proposed coflow abstraction bridges this gap and creates new opportunities for network scheduling. In this paper, we address inter-coflow scheduling for two different objectives: decreasing communication time of data-intensive jobs and guaranteeing predictable communication time. We introduce the concurrent open shop scheduling with coupled resources problem, analyze its complexity, and propose effective heuristics to optimize either objective. We present Varys, a system that enables data-intensive frameworks to use coflows and the proposed algorithms while maintaining high network utilization and guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete up to 3.16× faster on average and up to 2× more coflows meet their deadlines using Varys in comparison to per-flow mechanisms. Moreover, Varys outperforms non-preemptive coflow schedulers by more than 5×.

  • Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, George Varghese

    We present the design, implementation, and evaluation of CONGA, a network-based distributed congestion-aware load balancing mechanism for datacenters. CONGA exploits recent trends including the use of regular Clos topologies and overlays for network virtualization. It splits TCP flows into flowlets, estimates real-time congestion on fabric paths, and allocates flowlets to paths based on feedback from remote switches. This enables CONGA to efficiently balance load and seamlessly handle asymmetry, without requiring any TCP modifications. CONGA has been implemented in custom ASICs as part of a new datacenter fabric. In testbed experiments, CONGA has 5× better flow completion times than ECMP even with a single link failure and achieves 2–8× better throughput than MPTCP in Incast scenarios. Further, the Price of Anarchy for CONGA is provably small in Leaf-Spine topologies; hence CONGA is nearly as effective as a centralized scheduler while being able to react to congestion in microseconds. Our main thesis is that datacenter fabric load balancing is best done in the network, and requires global schemes such as CONGA to handle asymmetry.

  • Rohan Gandhi, Hongqiang Harry Liu, Y. Charlie Hu, Guohan Lu, Jitendra Padhye, Lihua Yuan, Ming Zhang

    Load balancing is a foundational function of datacenter infrastructures and is critical to the performance of online services hosted in datacenters. As the demand for cloud services grows, expensive and hard-to-scale dedicated hardware load balancers are being replaced with software load balancers that scale using a distributed data plane that runs on commodity servers. Software load balancers offer low cost, high availability and high flexibility, but suffer high latency and low capacity per load balancer, making them less than ideal for applications that demand either high throughput, or low latency or both. In this paper, we present D UET, which offers all the benefits of software load balancer, along with low latency and high availability – at next to no cost. We do this by exploiting a hitherto overlooked resource in the data center networks – the switches themselves. We show how to embed the load balancing functionality into existing hardware switches, thereby achieving organic scalability at no extra cost. For flexibility and high availability, D UET seamlessly integrates the switch-based load balancer with a small deployment of software load balancer. We enumerate and solve several architectural and algorithmic challenges involved in building such a hybrid load balancer. We evaluate D UET using a prototype implementation, as well as extensive simulations driven by traces from our production data centers. Our evaluation shows that D UET provides 10x more capacity than a software load balancer, at a fraction of a cost, while reducing latency by a factor of 10 or more, and is able to quickly adapt to network dynamics including failures.

  • Simon Peter, Umar Javed, Qiao Zhang, Doug Woos, Thomas Anderson, Arvind Krishnamurthy

    A longstanding problem with the Internet is that it is vulnerable to outages, black holes, hijacking and denial of service. Although architectural solutions have been proposed to address many of these issues, they have had difficulty being adopted due to the need for widespread adoption before most users would see any benefit. This is especially relevant as the Internet is increasingly used for applications where correct and continuous operation is essential. In this paper, we study whether a simple, easy to implement model is sufficient for addressing the aforementioned Internet vulnerabilities. Our model, called ARROW (Advertised Reliable Routing Over Waypoints), is designed to allow users to configure reliable and secure end to end paths through participating providers. With ARROW, a highly reliable ISP offers tunneled transit through its network, along with packet transformation at the ingress, as a service to remote paying customers. Those customers can stitch together reliable end to end paths through a combination of participating and non-participating ISPs in order to improve the faulttolerance, robustness, and security of mission critical transmissions. Unlike efforts to redesign the Internet from scratch, we show that ARROW can address a set of well-known Internet vulnerabilities, for most users, with the adoption of only a single transit ISP. To demonstrate ARROW, we have added it to a small-scale wide-area ISP we control. We evaluate its performance and failure recovery properties in both simulation and live settings.

  • Bryce Kellogg, Aaron Parks, Shyamnath Gollakota, Joshua R. Smith, David Wetherall

    RF-powered computers are small devices that compute and communicate using only the power that they harvest from RF signals. While existing technologies have harvested power from ambient RF sources (e.g., TV broadcasts), they require a dedicated gateway (like an RFID reader) for Internet connectivity. We present Wi-Fi Backscatter, a novel communication system that bridges RF-powered devices with the Internet. Specifically, we show that it is possible to reuse existing Wi-Fi infrastructure to provide Internet connectivity to RF-powered devices. To show Wi-Fi Backscatter’s feasibility, we build a hardware prototype and demonstrate the first communication link between an RF-powered device and commodity Wi-Fi devices. We use off-the-shelf Wi-Fi devices including Intel Wi-Fi cards, Linksys Routers, and our organization’s Wi-Fi infrastructure, and achieve communication rates of up to 1 kbps and ranges of up to 2.1 meters. We believe that this new capability can pave the way for the rapid deployment and adoption of RF-powered devices and achieve ubiquitous connectivity via nearby mobile devices that are Wi-Fi enabled.

  • Swarun Kumar, Ezzeldin Hamed, Dina Katabi, Li Erran Li

    Despite the rapid growth of next-generation cellular networks, researchers and end-users today have limited visibility into the performance and problems of these networks. As LTE deployments move towards femto and pico cells, even operators struggle to fully understand the propagation and interference patterns affecting their service, particularly indoors. This paper introduces LTEye, the first open platform to monitor and analyze LTE radio performance at a fine temporal and spatial granularity. LTEye accesses the LTE PHY layer without requiring private user information or provider support. It provides deep insights into the PHY-layer protocols deployed in these networks. LTEye’s analytics enable researchers and policy makers to uncover serious deficiencies in these networks due to inefficient spectrum utilization and inter-cell interference. In addition, LTEye extends synthetic aperture radar (SAR), widely used for radar and backscatter signals, to operate over cellular signals. This enables businesses and end-users to localize mobile users and capture the distribution of LTE performance across spatial locations in their facility. As a result, they can diagnose problems and better plan deployment of repeaters or femto cells. We implement LTEye on USRP software radios, and present empirical insights and analytics from multiple AT&T and Verizon base stations in our locality.

  • Ryan Craven, Robert Beverly, Mark Allman

    Understanding, measuring, and debugging IP networks, particularly across administrative domains, is challenging. One particularly daunting aspect of the challenge is the presence of transparent middleboxes—which are now common in today’s Internet. In-path middleboxes that modify packet headers are typically transparent to a TCP, yet can impact end-to-end performance or cause blackholes. We develop TCP HICCUPS to reveal packet header manipulation to both endpoints of a TCP connection. HICCUPS permits endpoints to cooperate with currently opaque middleboxes without prior knowledge of their behavior. For example, with visibility into end-to-end behavior, a TCP can selectively enable or disable performance enhancing options. This cooperation enables protocol innovation by allowing new IP or TCP functionality (e.g., ECN, SACK, Multipath TCP, Tcpcrypt) to be deployed without fear of such functionality being misconstrued, modified, or blocked along a path. HICCUPS is incrementally deployable and introduces no new options. We implement and deploy TCP HICCUPS across thousands of disparate Internet paths, highlighting the breadth and scope of subtle and hard to detect middlebox behaviors encountered. We then show how path diagnostic capabilities provided by HICCUPS can benefit applications and the network.

  • Fahad R. Dogar, Thomas Karagiannis, Hitesh Ballani, Antony Rowstron

    Many data center applications perform rich and complex tasks (e.g., executing a search query or generating a user’s news-feed). From a network perspective, these tasks typically comprise multiple flows, which traverse different parts of the network at potentially different times. Most network resource allocation schemes, however, treat all these flows in isolation – rather than as part of a task – and therefore only optimize flow-level metrics. In this paper, we show that task-aware network scheduling, which groups flows of a task and schedules them together, can reduce both the average as well as tail completion time for typical data center applications. To achieve these benefits in practice, we design and implement Baraat, a decentralized task-aware scheduling system. Baraat schedules tasks in a FIFO order but avoids head-of-line blocking by dynamically changing the level of multiplexing in the network. Through experiments with Memcached on a small testbed and large-scale simulations, we show that Baraat outperforms state-of-the-art decentralized schemes (e.g., pFabric) as well as centralized schedulers (e.g., Orchestra) for a wide range of workloads (e.g., search, analytics, etc).

Syndicate content