Srikanth Kandula

Low Latency Geo-distributed Data Analytics

By: 
Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica
Appears in: 
CCR August 2015

Low latency analytics on geographically distributed datasets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single datacenter significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics frameworks also leads to high query response times because these frameworks cannot cope with the relatively low and variable capacity of WAN links. We present Iridium, a system for low latency geo-distributed analytics.

Multi-resource packing for cluster schedulers

By: 
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella
Appears in: 
CCR August 2014

Tasks in modern data-parallel clusters have highly diverse resource requirements along CPU, memory, disk and network. We present Tetris, a multi-resource cluster scheduler that packs tasks to machines based on their requirements of all resource types. Doing so avoids resource fragmentation as well as over-allocation of the resources that are not explicitly allocated, both of which are drawbacks of current schedulers.

Calendaring for wide area networks

By: 
Srikanth Kandula, Ishai Menache, Roy Schwartz, Spandana Raj Babbula
Appears in: 
CCR August 2014

Datacenter WAN traffic consists of high priority transfers that have to be carried as soon as they arrive, alongside large transfers with preassigned deadlines on their completion. The ability to offer guarantees to large transfers is crucial for business needs and impacts overall cost-of-business. State-of-the-art traffic engineering solutions only consider the current time epoch or minimize maximum utilization and hence cannot provide pre-facto promises to long-lived transfers.

Traffic engineering with forward fault correction

By: 
Hongqiang Harry Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang, David Gelernter
Appears in: 
CCR August 2014

Network faults such as link failures and high switch configuration delays can cause heavy congestion and packet loss. Because it takes time for the traffic engineering systems to detect and react to such faults, these conditions can last long—even tens of seconds. We propose forward fault correction (FFC), a proactive approach for handling faults. FFC spreads network traffic such that freedom from congestion is guaranteed under arbitrary combinations of up to k faults.

Dynamic scheduling of network updates

By: 
Xin Jin, Hongqiang Harry Liu, Rohan Gandhi, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Jennifer Rexford, Roger Wattenhofer
Appears in: 
CCR August 2014

We present Dionysus, a system for fast, consistent network updates in software-defined networks. Dionysus encodes as a graph the consistency-related dependencies among updates at individual switches, and it then dynamically schedules these updates based on runtime differences in the update speeds of different switches. This dynamic scheduling is the key to its speed; prior update methods are slow because they pre-determine a schedule, which does not adapt to runtime conditions.

VL2: A Scalable and Flexible Data Center Network

By: 
Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta
Appears in: 
CCR October 2009

To be agile and cost effective, data centers should allow dynamic resource allocation across large server pools. In particular, the data center network should enable any server to be assigned to any service. Tomeet these goals, we presentVL2, a practical network architecture that scales to support huge data centers with uniform high capacity between servers, performance isolation between services, and Ethernet layer-2 semantics.

Detailed Diagnosis in Enterprise Networks

By: 
Srikanth Kandula, Ratul Mahajan, Patrick Verkaik, Sharad Agarwal, Jitendra Padhye, and Paramvir Bahl
Appears in: 
CCR October 2009

By studying trouble tickets from small enterprise networks, we conclude that their operators need detailed fault diagnosis. That is, the diagnostic system should be able to diagnose not only generic faults (e.g., performance-related) but also application specific faults (e.g., error codes). It should also identify culprits at a fine granularity such as a process or firewall configuration. We build a system, called NetMedic, that enables detailed diagnosis by harnessing the rich information exposed by modern operating systems and applications.

Towards Highly reliable Enterprise Network Services via Inference of Multi-level Dependencies

By: 
Paramvir Bahl, Ranveer Chandra, Albert Greenberg, Srikanth Kandula, David A. Maltz, and Ming Zhang
Appears in: 
CCR October 2007

Localizing the sources of performance problems in large enterprise networks is extremely challenging. Dependencies are numerous, complex and inherently multi-level, spanning hardware and software components across the network and the computing infrastructure. To exploit these dependencies for fast, accurate problem localization, we introduce an Inference Graph model, which is welladapted to user-perceptible problems rooted in conditions giving rise to both partial service degradation and hard faults.

Can You Hear Me Now?!: It Must Be BGP

By: 
Nate Kushman, Srikanth Kandula, and Dina Katabi
Appears in: 
CCR April 2007

Industry observers expect VoIP to eventually replace most of the existing land-line telephone connections. Currently however, quality and reliability concerns largely limit VoIP usage to either personal calls on cross-domain services such as Skype and Vonage, or to single-domain services such as trunking, where a core ISP carries long-distance voice as VoIP only within its backbone, to save cost with a unified voice/data infrastructure.

Public Review By: 
Jon Crowcroft

Voice over IP (VOIP) is now part of every day life almost as much as e-mail and the web. We've been trying to get it to work for at least a quarter of a century. The voice funnel was an early device to packetize speech and was integrated into early ARPANET experiments by BBN. There is a direct line of descent from those experiments via the Network Voice Protocol, through to today's Realtime Transport Protocol.
Much of the work in the early days (indeed until the early 1990s) revolved around proposals to modify the Internet layer to provide QoS directly through the packet forwarding and end-to-end service interfaces. Thus the ST and ST-II protocols were developed, and RSVP, and a whole plethora of packet scheduling algorithms such as worst-case fair, weighted fair queueing and so forth, as well as their associated admission control algorithms, and, most relevant here, route pinning.
In the Inter-domain world in which we live, for the vast majority of end-to-end communications sessions, packets will traverse multiple ISPs. Even where some ISPs have deployed QoS or over-provisioned their links, a user cannot be assured of this in general. Thus their traffic may be subject to congestion or to re-routing. While web browsers are somewhat insensitive to variation in throughput or latency during a download, and e-mail users are really quite oblivious to it, VOIP users will perceive impact in the quality of experience directly from either congestion or from re-routing. Jitter, packet re-ordering, and packet loss are all things that VOIP applications are designed to cope with. However, there are limits to the amount that a play-out buffer can adapt before the user will simply hang up.
In the past, most work has concentrated on the impact of queuing on the play-out delay, loss concealment and hence resulting audio quality. This paper is one of the first to pin down the amount that inter-domain re-routing impacts VOIP. And it shows: BGP is a significant part of the problem. Indeed, the paper shows through systematic experimentation over a fairly large number of paths that chaos following BGP updates can account for as many as 50% of problems for VOIP calls, but worse, that these are the most serious problems and account, potentially for 90% of dropped calls. Their are limitations to an automated experimental approach, such as the one employed in this paper, that mean we cannot tell if this latter figure reflects actual user paper, but the model employed by the authors certainly supports a result in that sort of region.
This is a serious problem since it is unlikely that paths between users will traverse less ISPs suddenly (the ISP economic landscape cannot change that quickly, even if AT&T take over most of the world). The BGP protocol world would seem to be the next place to look for solutions. Perhaps the IETF needs to consider some VOIP aware approximate route pinning mechanism. Perhaps someone is already working on this, and will write a followup paper to explain the solutions.
We would like to hear from you, if you are: We must fix BGP!

Dynamic Load Balancing Without Packet Reordering

By: 
Srikanth Kandula, Dina Katabi, Shantanu Sinha, and Arthur Berger
Appears in: 
CCR April 2007

Dynamic load balancing is a popular recent technique that protects ISP networks from sudden congestion caused by load spikes or link failures. Dynamic load balancing protocols, however, require schemes for splitting traffic across multiple paths at a fine granularity. Current splitting schemes present a tussle between slicing granularity and packet reordering. Splitting traffic at the granularity of packets quickly and accurately assigns the desired traffic share to each path, but can reorder packets within a TCP flow, confusing TCP congestion control.

Public Review By: 
Matthew Roughan

When there are multiple paths across a network it is common for network operators to use some form of load balancing. Load-balancing allows more flexible and efficient allocation of resources, and thereby extends the lifetime of a network. The trick in load balancing is to decide which packets take which path. Until now, this forwarding decision was made either per packet with the result that reordering could occur within a flow (a potential problem TCP performance), or at the flow level (e.g. based on the IP source and destination address of packets). Splitting traffic at the level of flows removes the problem of reordering, but at the cost of a restriction in the granularity with which we can split traffic. This paper presents a new approach dubbed FLARE that operates on bursts of packets (flowlets) carefully chosen to avoid reordering, but allowing a finer granularity of balancing.
The reviewers reported enjoying the paper, in particular the insight into how the bursty behaviour of TCP could be exploited by the load-balancer. They noted that the authors were sensitive to the amount of state information required (keeping this to a minimum), and that the authors’ proofs, and evaluations of FLARE using tracedriven simulations were quite thorough.
However, the common limitations pointed to by all reviewers lie around the practical issues of such an approach. It is intrinsically linked to TCP traffic via the burstiness introduced by the congestion control. All of the presented work considered TCP traffic without interference from devices such as packet shapers, other sources of traffic, or multiple bottlenecks. It will be very interesting to see if the ideas in this paper are taken further, most importantly whether they are implemented and validated in practical settings.

Syndicate content