Debugging faults in complex networks often requires capturing and analyzing traffic at the packet level. In this task, datacenter networks (DCNs) present unique challenges with their scale, traffic volume, and diversity of faults. To troubleshoot faults in a timely manner, DCN administrators must a) identify affected packets inside large volume of traffic; b) track them across multiple network components; c) analyze traffic traces for fault patterns; and d) test or confirm potential causes. To our knowledge, no tool today can achieve both the specificity and scale required for this task.
Network faults such as link failures and high switch configuration delays can cause heavy congestion and packet loss. Because it takes time for the traffic engineering systems to detect and react to such faults, these conditions can last long—even tens of seconds. We propose forward fault correction (FFC), a proactive approach for handling faults. FFC spreads network traffic such that freedom from congestion is guaranteed under arbitrary combinations of up to k faults.
We present Dionysus, a system for fast, consistent network updates in software-defined networks. Dionysus encodes as a graph the consistency-related dependencies among updates at individual switches, and it then dynamically schedules these updates based on runtime differences in the update speeds of different switches. This dynamic scheduling is the key to its speed; prior update methods are slow because they pre-determine a schedule, which does not adapt to runtime conditions.
We present Statesman, a network-state management service that allows multiple network management applications to operate independently, while maintaining network-wide safety and performance invariants. Network state captures various aspects of the network such as which links are alive and how switches are forwarding traffic. Statesman uses three views of the network state. In observed state, it maintains an up-to-date view of the actual network state. Applications read this state and propose state changes based on their individual goals.
We consider the potential for network trace analysis while providing the guarantees of “differential privacy.” While differential privacy provably obscures the presence or absence of individual records in a dataset, it has two major limitations: analyses must (presently) be expressed in a higher level declarative language; and the analysis results are randomized before returning to the analyst.
This paper is based on a talk that I gave at CoNEXT 2009. Inspired by Hal Varian’s paper on building economic models, it describes a research method for building computer systems. I find this method useful in my work and hope that some readers will find it helpful as well.
By studying trouble tickets from small enterprise networks, we conclude that their operators need detailed fault diagnosis. That is, the diagnostic system should be able to diagnose not only generic faults (e.g., performance-related) but also application specific faults (e.g., error codes). It should also identify culprits at a fine granularity such as a process or firewall configuration. We build a system, called NetMedic, that enables detailed diagnosis by harnessing the rich information exposed by modern operating systems and applications.
connectivity from moving vehicles for common applications such as Web browsing and VoIP. Driven by this question, we conduct a study of connection quality available to vehicular WiFi clients based on measurements from testbeds in two different cities. We find that current WiFi handoff methods, in which clients communicate with one basestation at a time, lead to frequent disruptions in connectivity. We also find that clients can overcome many disruptions by communicating with multiple basestations simultaneously.
We present a novel approach to optimize the performance of IEEE 802.11-based multi-hop wireless networks. A unique feature of our approach is that it enables an accurate prediction of the resulting throughput of individual flows. At its heart lies a simple yet realistic model of the network that captures interference, traffic, and MAC-induced dependencies. Unless properly accounted for, these dependencies lead to unpredictable behaviors. For instance, we show that even a simple network of two links with one flow is vulnerable to severe performance degradation.
We study a fundamental yet under-explored facet in wireless communication – the width of the spectrum over which transmitters spread their signals, or the channel width. Through detailed measurements in controlled and live environments, and using only commodity 802.11 hardware, we first quantify the impact of channel width on throughput, range, and power consumption. Taken together, our findings make a strong case for wireless systems that adapt channel width. Such adaptation brings unique benefits.