Discovering configuration templates of virtualized tenant networks in multi-tenancy datacenters via graph-mining

Yosuke Himura, Yoshiko Yasuda
Appears in: 
CCR July 2012

Multi-tenant datacenter networking, with which multiple customer (tenant) networks are virtualized over a single shared physical infrastructure, is cost-effective but poses significant costs on manual configuration. Such tasks would be alleviated with configuration templates, whereas a crucial difficulty stems from creating appropriate (i.e., reusable) ones. In this work, we propose a graph-based method of mining configurations of existing tenants to extract their recurrent patterns that would be used as reusable templates for upcoming tenants. The effectiveness of the proposed method is demonstrated with actual configuration files obtained from a business datacenter network.

Public Review By: 
Sharad Agarwal

Network equipment and end systems are often manually configured by operators. At an ISP I used to work at, someone thankfully created a few templates by hand for common provisioning activities, and that saved us a lot of effort, time and bugs. Since then, papers have been published on creating such templates for ISPs and validating resulting configurations. This paper examines the problem for virtual network topology configuration in multi-tenancy datacenters. Such datacenters are becoming more common, not just with IaaS, PaaS and SaaS providers, but also with corporate datacenters that might have multiple LoB applications running inside them. Is there enough commonality between configurations in this environment to warrant such templates? Can they be automatically generated? The authors use numbers and observations from a real datacenter to motivate the problem. They use a corpus of real configurations. They examine similarities between tenant topologies, and cluster them. The automatically-created templates are then manually customized by an operations engineer to provision a new tenant, and the paper shows reductions in configuration time. All the reviewers found the paper to be interesting and informative. There are a few open issues worth considering. There is a computation overhead to the proposed system that may not scale to large tenants -- how big of a problem is that and how likely are we to see large tenants? About 50% of the customers are covered by 5 templates. Can we do better than 50%? Could the 5 templates have been generated manually?