Modeling on quicksand: dealing with the scarcity of ground truth in interdomain routing data

By: 
Phillipa Gill, Michael Schapira, Sharon Goldberg
Appears in: 
CCR January 2012

Researchers studying the interdomain routing system, its properties and new protocols, face many challenges in performing realistic evaluations and simulations. Modeling decisions with respect to AS-level topology, routing policies and traffic matrices are complicated by a scarcity of ground truth for each of these components. Moreover, scalability issues arise when attempting to simulate over large (although still incomplete) empirically-derived AS-level topologies. In this paper, we discuss our approach for analyzing the robustness of our results to incomplete empirical data. We do this by (1) developing fast simulation algorithms that enable us to (2) running multiple simulations with varied parameters that test the sensitivity of our research results.

Public Review By: 
Yin Zhang

This paper focuses on improving the scalability and robustness of simulations for analyzing interdomain routing techniques. There are two challenges with inter-domain routing simulations (as outlined by the paper): (i) The running time of simulations on large AS graphs can be very high – O(|V|^3) for an AS graph with |V| vertices. For an empirical AS graph with 36K nodes, it will take several months to finish. (ii) The lack of ground truth information makes assessing the accuracy and robustness of routing techniques difficult. To address the first challenge, the author develops a novel routing tree algorithm that takes only O(|V|^2) time to compute paths between all source-destination pairs in an AS graph with |V| vertices, which is significantly faster than the state of the art. The algorithm exploits the fact that realworld AS graphs are typically very sparse, with only around 4*|V| edges as opposed to O(|V|^2) edges. It computes all the paths by performing a specialized three-stage breadth-first search (BFS) on the AS graph. To further improve the scalability in the context of repeated simulations, the paper develops lightweight faster amortized algorithms that achieve 5-times speedup compared to running repeated simulations. The idea is to run a single computation for all-pairs paths once and saving and reusing the intermediate results for subsequent iterations. Since the algorithms can be run independently across destinations, Map-reduce style parallelization is used to achieve another 200-times speedup. This is very good! Regarding robustness, the paper proposes to perform repeated simulations with varied parameters – this becomes computationally feasible because of the significantly reduced run-time. While it is arguable whether repeated simulations alone suffice to cope with the lack of ground truth, the approach helps better understand the impact of the imperfect data and modeling assumptions and is therefore clearly valuable. Overall, a very nice paper. The core ideas are both interesting and useful. The techniques proposed in the paper should really become the common practice in future large-scale simulation studies of interdomain routing.