Amin Vahdat

BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing

By: 
Alok Kumar, Sushant Jain, Uday Naik, Anand Raghuraman, Nikhil Kasinadhuni, Enrique Cauich Zermeno, C. Stephen Gunn, Jing Ai, Bj?rn Carlin, Mihai Amarandei-Stavila, Mathieu Robin, Aspi Siganporia, Stephen Stuart, Amin Vahdat
Appears in: 
CCR August 2015

WAN bandwidth remains a constrained resource that is economically infeasible to substantially overprovision. Hence, it is important to allocate capacity according to service priority and based on the incremental value of additional allocation. For example, it may be the highest priority for one service to receive 10Gb/s of bandwidth but upon reaching such an allocation, incremental priority may drop sharply favoring allocation to other services.

Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network

By: 
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs H?lzle, Stephen Stuart, Amin Vahdat
Appears in: 
CCR August 2015

We present our approach for overcoming the cost, operational complexity, and limited scale endemic to datacenter networks a decade ago. Three themes unify the five generations of datacenter networks detailed in this paper. First, multi-stage Clos topologies built from commodity switch silicon can support cost-effective deployment of building-scale networks. Second, much of the general, but complex, decentralized network routing and management protocols supporting arbitrary deployment scenarios were overkill for single-operator, pre-planned datacenter networks.

Condor: Better Topologies Through Declarative Design

By: 
Brandon Schlinker, Radhika Niranjan Mysore, Sean Smith, Jeffrey C. Mogul, Amin Vahdat, Minlan Yu, Ethan Katz-Bassett, Michael Rubin
Appears in: 
CCR August 2015

The design space for large, multipath datacenter networks is large and complex, and no one design fits all purposes. Network architects must trade off many criteria to design costeffective, reliable, and maintainable networks, and typically cannot explore much of the design space. We present Condor, our approach to enabling a rapid, efficient design cycle. Condor allows architects to express their requirements as constraints via a Topology Description Language (TDL), rather than having to directly specify network structures.

TIMELY: RTT-based Congestion Control for the Datacenter

By: 
Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, David Zats
Appears in: 
CCR August 2015

Datacenter transports aim to deliver low latency messaging together with high throughput. We show that simple packet delay, measured as round-trip times at hosts, is an effective congestion signal without the need for switch feedback. First, we show that advances in NIC hardware have made RTT measurement possible with microsecond accuracy, and that these RTTs are sufficient to estimate switch queueing. Then we describe how TIMELY can adjust transmission rates using RTT gradients to keep packet latency low while delivering high bandwidth.

DREAM: dynamic resource allocation for software-defined measurement

By: 
Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat
Appears in: 
CCR August 2014

Software-defined networks can enable a variety of concurrent, dynamically instantiated, measurement tasks, that provide fine-grain visibility into network traffic. Recently, there have been many proposals to configure TCAM counters in hardware switches to monitor traffic. However, the TCAM memory at switches is fundamentally limited and the accuracy of the measurement tasks is a function of the resources devoted to them on each switch.

Netshare and stochastic netshare: predictable bandwidth allocation for data centers

By: 
Vinh The Lam, Sivasankar Radhakrishnan, Rong Pan, Amin Vahdat, George Varghese
Appears in: 
CCR July 2012

Application performance in cloud data centers often depends crucially on network bandwidth, not just the aggregate data transmitted as in typical SLAs. We describe a mechanism for data center networks called NetShare that requires no hardware changes to routers but allows bandwidth to be allocated predictably across services based on weights. The weights are either specified by a manager, or automatically assigned at each switch port based on a virtual machine heuristic for isolation.

Public Review By: 
Sharad Agarwal

Tenants in datacenters desire performance isolation from each other. Such isolation for the network has been difficult to achieve without sacrificing utilization. This paper presents a set of techniques that together could achieve such isolation without requiring hardware changes in switches. The system is evaluated on a testbed of Fulcrum switches. The techniques employed are as follows. On each switch, on each outbound link, a separate DRR queue is configured for each class of service. Tenants are clustered into these classes, and the weight of each class is the sum of the weights of the tenants. These weights are assigned by an operator when a tenant is provisioned. The traffic for each tenant is labeled so that it lands in the right queue. To handle UDP, each host needs a rate throttling shim. A centralized bandwidth allocator measures the rates of flows and then decides on new rates that are enforced using token bucket rate limiters at hosts or ingress switch ports. There is a lot to absorb in this paper and the reviewers craved more details. One reviewer was concerned about how the system scales down to a small number of tenants because of a potential for bandwidth stealing, or how it scales to fast churn in tenants. Another was more concerned about the speed with which switch configurations could be updated. All the reviewers liked the paper. It is timely and the topic is important. The implementation on Fulcrum switches impressed them. A general question worth pondering is what type of isolation the datacenter operator wants to offer, and what type tenants desire, and are those two in conflict? I suspect that one wants to offer proportional sharing of bandwidth, while the other wants minimum guaranteed bandwidths.

Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers

By: 
Nathan Farrington, George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat
Appears in: 
CCR October 2010

The basic building block of ever larger data centers has shifted from a rack to a modular container with hundreds or even thousands of servers. Delivering scalable bandwidth among such containers is a challenge. A number of recent efforts promise full bisection bandwidth between all servers, though with significant cost, complexity, and power consumption.

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

By: 
Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya, and Amin Vahdat
Appears in: 
CCR October 2009

This paper considers the requirements for a scalable, easily manageable, fault-tolerant, and efficient data center network fabric. Trends in multi-core processors, end-host virtualization, and commodities of scale are pointing to future single-site data centers with millions of virtual end points. Existing layer 2 and layer 3 network protocols face some combination of limitations in such a setting: lack of scalability, difficult management, in exible communication, or limited support for virtual machine migration.

A Scalable, Commodity Data Center Network Architecture

By: 
Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat
Appears in: 
CCR October 2008

Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost.

Orbis: Rescaling Degree Correlations to Generate Annotated Internet Topologies

By: 
Priya Mahadevan, Calvin Hubble, Dmitri Krioukov, Bradley Huffaker, and Amin Vahdat
Appears in: 
CCR October 2007

Researchers involved in designing network services and protocols rely on results from simulation and emulation environments to evaluate correctness, performance and scalability. To better understand the behavior of these applications and to predict their performance when deployed across the Internet, the generated topologies that serve as input to simulated and emulated environments must closely match real network characteristics, not just in terms of graph structure (node interconnectivity) but also with respect to various node and link annotations.

Syndicate content