Computer Communication Review: Papers

Find a CCR issue:
  • Anuj Kalia, Michael Kaminsky, David G. Andersen

    This paper describes the design and implementation of HERD, a keyvalue system designed to make the best use of an RDMA network. Unlike prior RDMA-based key-value systems, HERD focuses its design on reducing network round trips while using efficient RDMA primitives; the result is substantially lower latency, and throughput that saturates modern, commodity RDMA hardware. HERD has two unconventional decisions: First, it does not use RDMA reads, despite the allure of operations that bypass the remote CPU entirely. Second, it uses a mix of RDMA and messaging verbs, despite the conventional wisdom that the messaging primitives are slow. A HERD client writes its request into the server’s memory; the server computes the reply. This design uses a single round trip for all requests and supports up to 26 million key-value operations per second with 5 µs average latency. Notably, for small key-value items, our full system throughput is similar to native RDMA read throughput and is over 2X higher than recent RDMA-based keyvalue systems. We believe that HERD further serves as an effective template for the construction of RDMA-based datacenter services.

  • Arpit Gupta, Laurent Vanbever, Muhammad Shahbaz, Sean P. Donovan, Brandon Schlinker, Nick Feamster, Jennifer Rexford, Scott Shenker, Russ Clark, Ethan Katz-Bassett

    BGP severely constrains how networks can deliver traffic over the Internet. Today’s networks can only forward traffic based on the destination IP prefix, by selecting among routes offered by their immediate neighbors. We believe Software Defined Networking (SDN) could revolutionize wide-area traffic delivery, by offering direct control over packet-processing rules that match on multiple header fields and perform a variety of actions. Internet exchange points (IXPs) are a compelling place to start, given their central role in interconnecting many networks and their growing importance in bringing popular content closer to end users. To realize a Software Defined IXP (an “SDX”), we must create compelling applications, such as “application-specific peering”— where two networks peer only for (say) streaming video traffic. We also need new programming abstractions that allow participating networks to create and run these applications and a runtime that both behaves correctly when interacting with BGP and ensures that applications do not interfere with each other. Finally, we must ensure that the system scales, both in rule-table size and computational overhead. In this paper, we tackle these challenges and demonstrate the flexibility and scalability of our solutions through controlled and in-the-wild experiments. Our experiments demonstrate that our SDX implementation can implement representative policies for hundreds of participants who advertise full routing tables while achieving sub-second convergence in response to configuration changes and routing updates.

  • Konstantinos Nikitopoulos, Juan Zhou, Ben Congdon, Kyle Jamieson

    This paper presents the design and implementation of Geosphere, a physical- and link-layer design for access point-based MIMO wireless networks that consistently improves network throughput. To send multiple streams of data in a MIMO system, prior designs rely on a technique called zero-forcing, a way of “nulling” the interference between data streams by mathematically inverting the wireless channel matrix. In general, zero-forcing is highly effective, significantly improving throughput. But in certain physical situations, the MIMO channel matrix can become “poorly conditioned,” harming performance. With these situations in mind, Geosphere uses sphere decoding, a more computationally demanding technique that can achieve higher throughput in such channels. To overcome the sphere decoder’s computational complexity when sending dense wireless constellations at a high rate, Geosphere introduces search and pruning techniques that incorporate novel geometric reasoning about the wireless constellation. These techniques reduce computational complexity of 256-QAM systems by almost one order of magnitude, bringing computational demands in line with current 16- and 64-QAM systems already realized in ASIC. Geosphere thus makes the sphere decoder practical for the first time in a 4 × 4 MIMO, 256-QAM system. Results from our WARP testbed show that Geosphere achieves throughput gains over multi-user MIMO of 2× in 4 × 4 systems and 47% in 2 × 2 MIMO systems.

  • Guan-Hua Tu, Yuanjie Li, Chunyi Peng, Chi-Yu Li, Hongyi Wang, Songwu Lu

    Control-plane protocols are complex in cellular networks. They communicate with one another along three dimensions of cross layers, cross (circuit-switched and packet-switched) domains, and cross (3G and 4G) systems. In this work, we propose signaling diagnosis tools and uncover six instances of problematic interactions. Such control-plane issues span both design defects in the 3GPP standards and operational slips by carriers. They are more damaging than data-plane failures. In the worst-case scenario, users may be out of service in 4G, or get stuck in 3G. We deduce root causes, propose solutions, and summarize learned lessons.

  • Colin Scott, Andreas Wundsam, Barath Raghavan, Aurojit Panda, Andrew Or, Jefferson Lai, Eugene Huang, Zhi Liu, Ahmed El-Hassany, Sam Whitlock, H.B. Acharya, Kyriakos Zarifis, Scott Shenker

    Software bugs are inevitable in software-defined networking control software, and troubleshooting is a tedious, time-consuming task. In this paper we discuss how to improve control software troubleshooting by presenting a technique for automatically identifying a minimal sequence of inputs responsible for triggering a given bug, without making assumptions about the language or instrumentation of the software under test. We apply our technique to five open source SDN control platforms—Floodlight, NOX, POX, Pyretic, ONOS—and illustrate how the minimal causal sequences our system found aided the troubleshooting process.

  • Ali Munir, Ghufran Baig, Syed M. Irteza, Ihsan A. Qazi, Alex X. Liu, Fahad R. Dogar

    Many data center transports have been proposed in recent times (e.g., DCTCP, PDQ, pFabric, etc). Contrary to the common perception that they are competitors (i.e., protocol A vs. protocol B), we claim that the underlying strategies used in these protocols are, in fact, complementary. Based on this insight, we design PASE, a transport framework that synthesizes existing transport strategies, namely, self-adjusting endpoints (used in TCP style protocols), innetwork prioritization (used in pFabric), and arbitration (used in PDQ). PASE is deployment friendly: it does not require any changes to the network fabric; yet, its performance is comparable to, or better than, the state-of-the-art protocols that require changes to network elements (e.g., pFabric). We evaluate PASE using simulations and testbed experiments. Our results show that PASE performs well for a wide range of application workloads and network settings.

  • David Naylor, Matthew K. Mukerjee, Peter Steenkiste

    Though most would agree that accountability and privacy are both valuable, today’s Internet provides little support for either. Previous efforts have explored ways to offer stronger guarantees for one of the two, typically at the expense of the other; indeed, at first glance accountability and privacy appear mutually exclusive. At the center of the tussle is the source address: in an accountable Internet, source addresses undeniably link packets and senders so hosts can be punished for bad behavior. In a privacy-preserving Internet, source addresses are hidden as much as possible. In this paper, we argue that a balance is possible. We introduce the Accountable and Private Internet Protocol (APIP), which splits source addresses into two separate fields — an accountability address and a return address — and introduces independent mechanisms for managing each. Accountability addresses, rather than pointing to hosts, point to accountability delegates, which agree to vouch for packets on their clients’ behalves, taking appropriate action when misbehavior is reported. With accountability handled by delegates, senders are now free to mask their return addresses; we discuss a few techniques for doing so.

  • Xin Jin, Hongqiang Harry Liu, Rohan Gandhi, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Jennifer Rexford, Roger Wattenhofer

    We present Dionysus, a system for fast, consistent network updates in software-defined networks. Dionysus encodes as a graph the consistency-related dependencies among updates at individual switches, and it then dynamically schedules these updates based on runtime differences in the update speeds of different switches. This dynamic scheduling is the key to its speed; prior update methods are slow because they pre-determine a schedule, which does not adapt to runtime conditions. Testbed experiments and data-driven simulations show that Dionysus improves the median update speed by 53–88% in both wide area and data center networks compared to prior methods.

  • Zhiyong Zhang, Ovidiu Mara, Katerina Argyraki

    When can we reason about the neutrality of a network based on external observations? We prove conditions under which it is possible to (a) detect neutrality violations and (b) localize them to specific links, based on external observations. Our insight is that, when we make external observations from different vantage points, these will most likely be inconsistent with each other if the network is not neutral. Where existing tomographic techniques try to form solvable systems of equations to infer network properties, we try to form unsolvable systems that reveal neutrality violations. We present an algorithm that relies on this idea to identify sets of nonneutral links based on external observations, and we show, through network emulation, that it achieves good accuracy for a variety of network conditions.

  • Jonathan Perry, Amy Ousterhout, Hari Balakrishnan, Devavrat Shah, Hans Fugal

    An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate control—to a centralized arbiter—of when each packet should be transmitted and what path it should follow. This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the first determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efficient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebook’s datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240× reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent flow throughputs than the baseline TCP (5200× reduction in the standard deviation of per-flow throughput with five concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of traffic in software on eight cores, and a 2.5× reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.

  • Jeff Rasley, Brent Stephens, Colin Dixon, Eric Rozner, Wes Felter, Kanak Agarwal, John Carter, Rodrigo Fonseca

    Software-defined networking introduces the possibility of building self-tuning networks that constantly monitor network conditions and react rapidly to important events such as congestion. Unfortunately, state-of-the-art monitoring mechanisms for conventional networks require hundreds of milliseconds to seconds to extract global network state, like link utilization or the identity of “elephant” flows. Such latencies are adequate for responding to persistent issues, e.g., link failures or long-lasting congestion, but are inadequate for responding to transient problems, e.g., congestion induced by bursty workloads sharing a link. In this paper, we present Planck, a novel network measurement architecture that employs oversubscribed port mirroring to extract network information at 280 µs–7 ms timescales on a 1 Gbps commodity switch and 275 µs–4 ms timescales on a 10 Gbps commodity switch, over 11x and 18x faster than recent approaches, respectively (and up to 291x if switch firmware allowed buffering to be disabled on some ports). To demonstrate the value of Planck’s speed and accuracy, we use it to drive a traffic engineering application that can reroute congested flows in milliseconds. On a 10 Gbps commodity switch, Planck-driven traffic engineering achieves aggregate throughput within 1–4% of optimal for most workloads we evaluated, even with flows as small as 50 MiB, an improvement of up to 53% over previous schemes.

  • Ilias Marinos, Robert N.M. Watson, Mark Handley
  • Aaron N. Parks, Angli Liu, Shyamnath Gollakota, Joshua R. Smith

    Communication primitives such as coding and multiple antenna processing have provided significant benefits for traditional wireless systems. Existing designs, however, consume significant power and computational resources, and hence cannot be run on low complexity, power constrained backscatter devices. This paper makes two main contributions: (1) we introduce the first multi-antenna cancellation design that operates on backscatter devices while retaining a small form factor and power footprint, (2) we introduce a novel coding mechanism that enables long range communication as well as concurrent transmissions and can be decoded on backscatter devices. We build hardware prototypes of the above designs that can be powered solely using harvested energy from TV and solar sources. The results show that our designs provide benefits for both RFID and ambient backscatter systems: they enable RFID tags to communicate directly with each other at distances of tens of meters and through multiple walls. They also increase the communication rate and range achieved by ambient backscatter systems by 100X and 40X respectively. We believe that this paper represents a substantial leap in the capabilities of backscatter communication.

  • Aaron Gember-Jacobson, Raajay Viswanathan, Chaithan Prakash, Robert Grandl, Junaid Khalid, Sourav Das, Aditya Akella

    Network functions virtualization (NFV) together with softwaredefined networking (SDN) has the potential to help operators satisfy tight service level agreements, accurately monitor and manipulate network traffic, and minimize operating expenses. However, in scenarios that require packet processing to be redistributed across a collection of network function (NF) instances, simultaneously achieving all three goals requires a framework that provides efficient, coordinated control of both internal NF state and network forwarding state. To this end, we design a control plane called OpenNF. We use carefully designed APIs and a clever combination of events and forwarding updates to address race conditions, bound overhead, and accommodate a variety of NFs. Our evaluation shows that OpenNF offers efficient state control without compromising flexibility, and requires modest additions to NFs.

  • Hongqiang Harry Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang, David Gelernter

    Network faults such as link failures and high switch configuration delays can cause heavy congestion and packet loss. Because it takes time for the traffic engineering systems to detect and react to such faults, these conditions can last long—even tens of seconds. We propose forward fault correction (FFC), a proactive approach for handling faults. FFC spreads network traffic such that freedom from congestion is guaranteed under arbitrary combinations of up to k faults. We show how FFC can be practically realized by compactly encoding the constraints that arise from this large number of possible faults and solving them efficiently using sorting networks. Experiments with data from real networks show that, with negligible loss in overall network throughput, FFC can reduce data loss by a factor of 7–130 in well-provisioned networks, and reduce the loss of high-priority traffic to almost zero in well-utilized networks.

  • Mosharaf Chowdhury, Yuan Zhong, Ion Stoica

    Communication in data-parallel applications often involves a collection of parallel flows. Traditional techniques to optimize flowlevel metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-level requirements. The recently proposed coflow abstraction bridges this gap and creates new opportunities for network scheduling. In this paper, we address inter-coflow scheduling for two different objectives: decreasing communication time of data-intensive jobs and guaranteeing predictable communication time. We introduce the concurrent open shop scheduling with coupled resources problem, analyze its complexity, and propose effective heuristics to optimize either objective. We present Varys, a system that enables data-intensive frameworks to use coflows and the proposed algorithms while maintaining high network utilization and guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete up to 3.16× faster on average and up to 2× more coflows meet their deadlines using Varys in comparison to per-flow mechanisms. Moreover, Varys outperforms non-preemptive coflow schedulers by more than 5×.

  • Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, George Varghese

    We present the design, implementation, and evaluation of CONGA, a network-based distributed congestion-aware load balancing mechanism for datacenters. CONGA exploits recent trends including the use of regular Clos topologies and overlays for network virtualization. It splits TCP flows into flowlets, estimates real-time congestion on fabric paths, and allocates flowlets to paths based on feedback from remote switches. This enables CONGA to efficiently balance load and seamlessly handle asymmetry, without requiring any TCP modifications. CONGA has been implemented in custom ASICs as part of a new datacenter fabric. In testbed experiments, CONGA has 5× better flow completion times than ECMP even with a single link failure and achieves 2–8× better throughput than MPTCP in Incast scenarios. Further, the Price of Anarchy for CONGA is provably small in Leaf-Spine topologies; hence CONGA is nearly as effective as a centralized scheduler while being able to react to congestion in microseconds. Our main thesis is that datacenter fabric load balancing is best done in the network, and requires global schemes such as CONGA to handle asymmetry.

  • Rohan Gandhi, Hongqiang Harry Liu, Y. Charlie Hu, Guohan Lu, Jitendra Padhye, Lihua Yuan, Ming Zhang

    Load balancing is a foundational function of datacenter infrastructures and is critical to the performance of online services hosted in datacenters. As the demand for cloud services grows, expensive and hard-to-scale dedicated hardware load balancers are being replaced with software load balancers that scale using a distributed data plane that runs on commodity servers. Software load balancers offer low cost, high availability and high flexibility, but suffer high latency and low capacity per load balancer, making them less than ideal for applications that demand either high throughput, or low latency or both. In this paper, we present D UET, which offers all the benefits of software load balancer, along with low latency and high availability – at next to no cost. We do this by exploiting a hitherto overlooked resource in the data center networks – the switches themselves. We show how to embed the load balancing functionality into existing hardware switches, thereby achieving organic scalability at no extra cost. For flexibility and high availability, D UET seamlessly integrates the switch-based load balancer with a small deployment of software load balancer. We enumerate and solve several architectural and algorithmic challenges involved in building such a hybrid load balancer. We evaluate D UET using a prototype implementation, as well as extensive simulations driven by traces from our production data centers. Our evaluation shows that D UET provides 10x more capacity than a software load balancer, at a fraction of a cost, while reducing latency by a factor of 10 or more, and is able to quickly adapt to network dynamics including failures.

  • Simon Peter, Umar Javed, Qiao Zhang, Doug Woos, Thomas Anderson, Arvind Krishnamurthy

    A longstanding problem with the Internet is that it is vulnerable to outages, black holes, hijacking and denial of service. Although architectural solutions have been proposed to address many of these issues, they have had difficulty being adopted due to the need for widespread adoption before most users would see any benefit. This is especially relevant as the Internet is increasingly used for applications where correct and continuous operation is essential. In this paper, we study whether a simple, easy to implement model is sufficient for addressing the aforementioned Internet vulnerabilities. Our model, called ARROW (Advertised Reliable Routing Over Waypoints), is designed to allow users to configure reliable and secure end to end paths through participating providers. With ARROW, a highly reliable ISP offers tunneled transit through its network, along with packet transformation at the ingress, as a service to remote paying customers. Those customers can stitch together reliable end to end paths through a combination of participating and non-participating ISPs in order to improve the faulttolerance, robustness, and security of mission critical transmissions. Unlike efforts to redesign the Internet from scratch, we show that ARROW can address a set of well-known Internet vulnerabilities, for most users, with the adoption of only a single transit ISP. To demonstrate ARROW, we have added it to a small-scale wide-area ISP we control. We evaluate its performance and failure recovery properties in both simulation and live settings.

  • Bryce Kellogg, Aaron Parks, Shyamnath Gollakota, Joshua R. Smith, David Wetherall

    RF-powered computers are small devices that compute and communicate using only the power that they harvest from RF signals. While existing technologies have harvested power from ambient RF sources (e.g., TV broadcasts), they require a dedicated gateway (like an RFID reader) for Internet connectivity. We present Wi-Fi Backscatter, a novel communication system that bridges RF-powered devices with the Internet. Specifically, we show that it is possible to reuse existing Wi-Fi infrastructure to provide Internet connectivity to RF-powered devices. To show Wi-Fi Backscatter’s feasibility, we build a hardware prototype and demonstrate the first communication link between an RF-powered device and commodity Wi-Fi devices. We use off-the-shelf Wi-Fi devices including Intel Wi-Fi cards, Linksys Routers, and our organization’s Wi-Fi infrastructure, and achieve communication rates of up to 1 kbps and ranges of up to 2.1 meters. We believe that this new capability can pave the way for the rapid deployment and adoption of RF-powered devices and achieve ubiquitous connectivity via nearby mobile devices that are Wi-Fi enabled.

  • Swarun Kumar, Ezzeldin Hamed, Dina Katabi, Li Erran Li

    Despite the rapid growth of next-generation cellular networks, researchers and end-users today have limited visibility into the performance and problems of these networks. As LTE deployments move towards femto and pico cells, even operators struggle to fully understand the propagation and interference patterns affecting their service, particularly indoors. This paper introduces LTEye, the first open platform to monitor and analyze LTE radio performance at a fine temporal and spatial granularity. LTEye accesses the LTE PHY layer without requiring private user information or provider support. It provides deep insights into the PHY-layer protocols deployed in these networks. LTEye’s analytics enable researchers and policy makers to uncover serious deficiencies in these networks due to inefficient spectrum utilization and inter-cell interference. In addition, LTEye extends synthetic aperture radar (SAR), widely used for radar and backscatter signals, to operate over cellular signals. This enables businesses and end-users to localize mobile users and capture the distribution of LTE performance across spatial locations in their facility. As a result, they can diagnose problems and better plan deployment of repeaters or femto cells. We implement LTEye on USRP software radios, and present empirical insights and analytics from multiple AT&T and Verizon base stations in our locality.

  • Ryan Craven, Robert Beverly, Mark Allman

    Understanding, measuring, and debugging IP networks, particularly across administrative domains, is challenging. One particularly daunting aspect of the challenge is the presence of transparent middleboxes—which are now common in today’s Internet. In-path middleboxes that modify packet headers are typically transparent to a TCP, yet can impact end-to-end performance or cause blackholes. We develop TCP HICCUPS to reveal packet header manipulation to both endpoints of a TCP connection. HICCUPS permits endpoints to cooperate with currently opaque middleboxes without prior knowledge of their behavior. For example, with visibility into end-to-end behavior, a TCP can selectively enable or disable performance enhancing options. This cooperation enables protocol innovation by allowing new IP or TCP functionality (e.g., ECN, SACK, Multipath TCP, Tcpcrypt) to be deployed without fear of such functionality being misconstrued, modified, or blocked along a path. HICCUPS is incrementally deployable and introduces no new options. We implement and deploy TCP HICCUPS across thousands of disparate Internet paths, highlighting the breadth and scope of subtle and hard to detect middlebox behaviors encountered. We then show how path diagnostic capabilities provided by HICCUPS can benefit applications and the network.

  • Fahad R. Dogar, Thomas Karagiannis, Hitesh Ballani, Antony Rowstron

    Many data center applications perform rich and complex tasks (e.g., executing a search query or generating a user’s news-feed). From a network perspective, these tasks typically comprise multiple flows, which traverse different parts of the network at potentially different times. Most network resource allocation schemes, however, treat all these flows in isolation – rather than as part of a task – and therefore only optimize flow-level metrics. In this paper, we show that task-aware network scheduling, which groups flows of a task and schedules them together, can reduce both the average as well as tail completion time for typical data center applications. To achieve these benefits in practice, we design and implement Baraat, a decentralized task-aware scheduling system. Baraat schedules tasks in a FIFO order but avoids head-of-line blocking by dynamically changing the level of multiplexing in the network. Through experiments with Memcached on a small testbed and large-scale simulations, we show that Baraat outperforms state-of-the-art decentralized schemes (e.g., pFabric) as well as centralized schedulers (e.g., Orchestra) for a wide range of workloads (e.g., search, analytics, etc).

  • Tiffany Hyun-Jin Kim, Cristina Basescu, Limin Jia, Soo Bum Lee, Yih-Chun Hu, Adrian Perrig

    In-network source authentication and path validation are fundamental primitives to construct higher-level security mechanisms such as DDoS mitigation, path compliance, packet attribution, or protection against flow redirection. Unfortunately, currently proposed solutions either fall short of addressing important security concerns or require a substantial amount of router overhead. In this paper, we propose lightweight, scalable, and secure protocols for shared key setup, source authentication, and path validation. Our prototype implementation demonstrates the efficiency and scalability of the protocols, especially for software-based implementations.

  • Anirudh Sivaraman, Keith Winstein, Pratiksha Thaker, Hari Balakrishnan

    When designing a distributed network protocol, typically it is infeasible to fully define the target network where the protocol is intended to be used. It is therefore natural to ask: How faithfully do protocol designers really need to understand the networks they design for? What are the important signals that endpoints should listen to? How can researchers gain confidence that systems that work well on well-characterized test networks during development will also perform adequately on real networks that are inevitably more complex, or future networks yet to be developed? Is there a tradeoff between the performance of a protocol and the breadth of its intended operating range of networks? What is the cost of playing fairly with cross-traffic that is governed by another protocol? We examine these questions quantitatively in the context of congestion control, by using an automated protocol-design tool to approximate the best possible congestion-control scheme given imperfect prior knowledge about the network. We found only weak evidence of a tradeoff between operating range in link speeds and performance, even when the operating range was extended to cover a thousand-fold range of link speeds. We found that it may be acceptable to simplify some characteristics of the network—such as its topology—when modeling for design purposes. Some other features, such as the degree of multiplexing and the aggressiveness of contending endpoints, are important to capture in a model.

  • K.V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, Kannan Ramchandran

    Erasure codes such as Reed-Solomon (RS) codes are being extensively deployed in data centers since they offer significantly higher reliability than data replication methods at much lower storage overheads. These codes however mandate much higher resources with respect to network bandwidth and disk IO during reconstruction of data that is missing or otherwise unavailable. Existing solutions to this problem either demand additional storage space or severely limit the choice of the system parameters. In this paper, we present Hitchhiker, a new erasure-coded storage system that reduces both network traffic and disk IO by around 25% to 45% during reconstruction of missing or otherwise unavailable data, with no additional storage, the same fault tolerance, and arbitrary flexibility in the choice of parameters, as compared to RS-based systems. Hitchhiker "rides" on top of RS codes, and is based on novel encoding and decoding techniques that will be presented in this paper. We have implemented Hitchhiker in the Hadoop Distributed File System (HDFS). When evaluating various metrics on the data-warehouse cluster in production at Facebook with real-time traffic and workloads, during reconstruction, we observe a 36% reduction in the computation time and a 32% reduction in the data read time, in addition to the 35% reduction in network traffic and disk IO. Hitchhiker can thus reduce the latency of degraded reads and perform faster recovery from failed or decommissioned machines.

  • Jeongkeun Lee, Yoshio Turner, Myungjin Lee, Lucian Popa, Sujata Banerjee, Joon-Myung Kang, Puneet Sharma

    Providing bandwidth guarantees to specific applications is becoming increasingly important as applications compete for shared cloud network resources. We present CloudMirror, a solution that provides bandwidth guarantees to cloud applications based on a new network abstraction and workload placement algorithm. An effective network abstraction would enable applications to easily and accurately specify their requirements, while simultaneously enabling the infrastructure to provision resources efficiently for deployed applications. Prior research has approached the bandwidth guarantee specification by using abstractions that resemble physical network topologies. We present a contrasting approach of deriving a network abstraction based on application communication structure, called Tenant Application Graph or TAG. CloudMirror also incorporates a new workload placement algorithm that efficiently meets bandwidth requirements specified by TAGs while factoring in high availability considerations. Extensive simulations using real application traces and datacenter topologies show that CloudMirror can handle 40% more bandwidth demand than the state of the art (e.g., the Oktopus system), while improving high availability from 20% to 70%.

  • Dinesh Bharadia, Sachin Katti

    This paper presents, FastForward (FF), a novel full-duplex relay that constructively forwards signals such that wireless network throughput and coverage is significantly enhanced. FF is a Layer 1 in-band full-duplex device, it receives and transmits signals directly and simultaneously on the same frequency. It cleanly integrates into existing networks (both WiFi and LTE) as a separate device and does not require changes to the clients. FF’s key invention is a constructive filtering algorithm that transforms the signal at the relay such that when it reaches the destination, it constructively combines with the direct signals from the source and provides a significant throughput gain. We prototype FF using off-the-shelf software radios running a stock WiFi PHY and show experimentally that it provides a 3× median throughput increase and nearly a 4× gain at the edge of the coverage area.

  • Navid Hamedazimi, Zafar Qazi, Himanshu Gupta, Vyas Sekar, Samir R. Das, Jon P. Longtin, Himanshu Shah, Ashish Tanwer

    This paper describes FireFly a first but significant step toward realizing this vision. Figure 1 shows a high-level overview of FireFly. Each ToR is equipped with reconfigurable wireless links which can connect to other ToR switches. However, we need to look beyond traditional radio-frequency (RF) wireless solutions (e.g., 60GHz) as their interference characteristics limit range and capacity. Thus, we envision a new use-case for Free-Space Optical communications (FSO) as it can offer high data rates (tens of Gbps) over long ranges using low transmission power and with zero interference [31]. The centralized FireFly controller reconfigures the topology and forwarding rules to adapt to changing traffic patterns. While prior work made the case for using FSO links in DCs [19, 28], these fail to establish a viable hardware design and also do not address practical network design and management challenges that

  • Vivek Yenamandra, Kannan Srinivasan

    Global synchronization across time and frequency domains significantly benefits wireless communications. Multi-Cell (Network) MIMO, interference alignment solutions, opportunistic routing techniques in ad-hoc networks, OFDMA etc. all necessitate synchronization in either time or frequency domain or both. This paper presents Vidyut, a system that exploits the easily accessible and ubiquitous power line infrastructure to achieve synchronization in time and frequency domains across nodes distributed beyond a singlecollision domain. Vidyut uses the power lines to transmit a reference frequency tone to which each node locks its frequency. Vidyut exploits the steady periodicity of delivered power signal itself to synchronize distributed nodes in time. We validate the extent of Vidyut’s synchronization and evaluate its effectiveness. We verify Vidyut’s suitability for wireless applications such as OFDMA and multi-cell MIMO by validating the benefits of global synchronization in an enterprise wireless network. Our experiments show a throughput gain of 8.2x over MegaMIMO, 7x over NEMOx and 2.5x over OFDMA systems.

  • Jue Wang, Deepak Vasisht, Dina Katabi

    Prior work in RF-based positioning has mainly focused on discovering the absolute location of an RF source, where state-of-theart systems can achieve an accuracy on the order of tens of centimeters using a large number of antennas. However, many applications in gaming and gesture based interface see more benefits in knowing the detailed shape of a motion. Such trajectory tracing requires a resolution several fold higher than what existing RF-based positioning systems can offer. This paper shows that one can provide a dramatic increase in trajectory tracing accuracy, even with a small number of antennas. The key enabler for our design is a multi-resolution positioning technique that exploits an intrinsic tradeoff between improving the resolution and resolving ambiguity in the location of the RF source. The unique property of this design is its ability to precisely reconstruct the minute details in the trajectory shape, even when the absolute position might have an offset. We built a prototype of our design with commercial off-the-shelf RFID readers and tags and used it to enable a virtual touch screen, which allows a user to interact with a desired computing device by gesturing or writing her commands in the air, where each letter is only a few centimeters wide.

  • Abhigyan Sharma, Xiaozheng Tie, Hardeep Uppal, Arun Venkataramani, David Westbrook, Aditya Yadav

    Mobile devices dominate the Internet today, however the Internet rooted in its tethered origins continues to provide poor infrastructure support for mobility. Our position is that in order to address this problem, a key challenge that must be addressed is the design of a massively scalable global name service that rapidly resolves identities to network locations under high mobility. Our primary contribution is the design, implementation, and evaluation of Auspice, a nextgeneration global name service that addresses this challenge. A key insight underlying Auspice is a demand-aware replica placement engine that intelligently replicates name records to provide low lookup latency, low update cost, and high availability. We have implemented a prototype of Auspice and compared it against several commercial managed DNS providers as well as state-of-the-art research alternatives, and shown that Auspice significantly outperforms both. We demonstrate proof-of-concept that Auspice can serve as a complete end-to-end mobility solution as well as enable novel context-based communication primitives that generalize nameor address-based communication in today’s Internet.

  • Yunpeng James Liu, Peter Xiang Gao, Bernard Wong, Srinivasan Keshav

    Most datacenter network (DCN) designs focus on maximizing bisection bandwidth rather than minimizing server-to-server latency. We explore architectural approaches to building low-latency DCNs and introduce Quartz, a design element consisting of a full mesh of switches. Quartz can be used to replace portions of either a hierarchical network or a random network. Our analysis shows that replacing high port-count core switches with Quartz can significantly reduce switching delays, and replacing groups of topof-rack and aggregation switches with Quartz can significantly reduce congestion-related delays from cross-traffic. We overcome the complexity of wiring a complete mesh using low-cost optical multiplexers that enable us to efficiently implement a logical mesh as a physical ring. We evaluate our performance using both simulations and a small working prototype. Our evaluation results confirm our analysis, and demonstrate that it is possible to build low-latency DCNs using inexpensive commodity elements without significant concessions to cost, scalability, or wiring complexity.

  • Zhaoyu Gao, Arun Venkataramani, James F. Kurose, Simon Heimlicher

    This paper presents a quantitative methodology and results comparing different approaches for location-independent communication. Our approach is empirical and is based on real Internet topologies, routing tables from real routers, and a measured workload of the mobility of devices and content across network addresses today. We measure the extent of network mobility exhibited by mobile devices with a homebrewed Android app deployed on hundreds of smartphones, and measure the network mobility of Internet content from distributed vantage points. We combine this measured data with our quantitative methodology to analyze the different cost-benefit tradeoffs struck by location-independent network architectures with respect to routing update cost, path stretch, and forwarding table size. We find that more than 20% of users change over 10 IP addresses a day, suggesting that mobility is the norm rather than the exception, so intrinsic and efficient network support for mobility is critical. We also find that with purely name-based routing approaches, each event involving the mobility of a device or popular content may result in an update at up to 14% of Internet routers; but, the fraction of impacted routers is much smaller for the long tail of unpopular content. These results suggest that recent proposals for pure name-based networking may be suitable for highly aggregateable content that moves infrequently but may need to be augmented with addressing-assisted approaches to handle device mobility.

  • Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella

    Tasks in modern data-parallel clusters have highly diverse resource requirements along CPU, memory, disk and network. We present Tetris, a multi-resource cluster scheduler that packs tasks to machines based on their requirements of all resource types. Doing so avoids resource fragmentation as well as over-allocation of the resources that are not explicitly allocated, both of which are drawbacks of current schedulers. Tetris adapts heuristics for the multidimensional bin packing problem to the context of cluster schedulers wherein task arrivals and machine availability change in an online manner and wherein task’s resource needs change with time and with the machine that the task is placed at. In addition, Tetris improves average job completion time by preferentially serving jobs that have less remaining work. We observe that fair allocations do not offer the best performance and the above heuristics are compatible with a large class of fairness policies; hence, we show how to simultaneously achieve good performance and fairness. Tracedriven simulations and deployment of our Apache YARN prototype on a 250 node cluster show gains of over 30% in makespan and job completion time while achieving nearly perfect fairness.

  • Yang Wu, Mingchen Zhao, Andreas Haeberlen, Wenchao Zhou, Boon Thau Loo

    When debugging a distributed system, it is sometimes necessary to explain the absence of an event – for instance, why a certain route is not available, or why a certain packet did not arrive. Existing debuggers offer some support for explaining the presence of events, usually by providing the equivalent of a backtrace in conventional debuggers, but they are not very good at answering “Why not?” questions: there is simply no starting point for a possible backtrace. In this paper, we show that the concept of negative provenance can be used to explain the absence of events in distributed systems. Negative provenance relies on counterfactual reasoning to identify the conditions under which the missing event could have occurred. We define a formal model of negative provenance for distributed systems, and we present the design of a system called Y! that tracks both positive and negative provenance and can use them to answer diagnostic queries. We describe how we have used Y! to debug several realistic problems in two application domains: softwaredefined networks and BGP interdomain routing. Results from our experimental evaluation show that the overhead of Y! is moderate.

  • Srikanth Kandula, Ishai Menache, Roy Schwartz, Spandana Raj Babbula

    Datacenter WAN traffic consists of high priority transfers that have to be carried as soon as they arrive, alongside large transfers with preassigned deadlines on their completion. The ability to offer guarantees to large transfers is crucial for business needs and impacts overall cost-of-business. State-of-the-art traffic engineering solutions only consider the current time epoch or minimize maximum utilization and hence cannot provide pre-facto promises to long-lived transfers. We present Tempus, an online temporal planning scheme that appropriately packs long-running transfers across network paths and future timesteps, while leaving capacity slack for future changes. Tempus builds on a tailored approximate solution to a mixed packing-covering linear program, which is parallelizable and scales well in both running time and memory usage. Consequently, Tempus can quickly and effectively update the promised future flow allocation when new transfers arrive or unexpected changes happen. Our experiments on traces from a large production WAN show, Tempus can offer and keep promises to longlived transfers well in advance of their actual deadlines; the promise on minimal transfer size is comparable with an offline optimal solution and outperforms state-of-the-art solutions by 2-3X.

  • George Varghese

    The most compelling ideas in systems are abstractions such as virtual memory, sockets, or packet scheduling. Algorithmics is the servant of abstraction, allowing system performance to approach that of the underlying hardware, sometimes by using efficient algorithms but often by simply leveraging other aspects of the system. I will survey the trajectory of network algorithmics starting with a focus on speed and scale in the 1990s to measurement and security in the 2000s. While doing so, I will reflect on my experiences in choosing problems and conducting research. I will conclude by describing my passion for the emerging field of network verification and its confluence with programming language research.

  • Bo Zhang, Jinfan Wang, Xinyu Wang, Yingying Cheng, Xiaohua Jia, Jianfei He

    In the current Internet architecture, application service providers (ASPs) own users' data and social groups information, which made a handful of ASP companies growing bigger and bigger and denied small and medium companies from entering this business. We propose a new architecture, called Application Independent Information Infrastructure (AI3). The design goals of AI3 are: 1) Decoupling users' data from ASPs and users' social relations from ASPs, such that ASPs become independent from users’ data and social relations. 2) Open architecture, such that different ASPs can interoperate with each other. This demo is to show a prototype of AI3. The demo has four parts: 1) ASPindependent data management in AI3; 2) ASP-independent management of users’ social relations in AI3; 3) inter-domain data transport and user roaming; 4) real-time communications by using AI3. The demo video can be watched at: http://www.cs.cityu.edu.hk/~jia/AI3_DemoVideo.mp4

  • Dinesh Bharadia, Kiran Joshi, Sachin Katti

    This paper presents demonstration of a real-time full duplex pointto-point link, where transmission and reception occurs in the same spectrum band simultaneously between a pair of full-duplex radios. This demo first builds a full duplex radio by implementing selfinterference cancellation technique on top of a traditional half duplex radio architecture. We then establish a point-to-point link using a pair of these radios that can transmit and receive OFDM packets. By changing the environmental conditions around the full-duplex radios we then demonstrate the robustness of the self-interference cancellation to adapt to the changing environment.

Syndicate content