With steady improvement in the reliability and performance of communication devices, routing instabilities now contribute to many of the remaining service degradations and interruptions in modern networks. This has led to a renewed interest in centralized routing systems that, compared to distributed routing, can provide greater control over routing decisions and better visibility of the results. One benefit of centralized control is the opportunity to readily eliminate transient routing loops, which arise frequently after network changes because of inconsistent routing states across devices. Translating this conceptual simplicity into a solution with tolerable message complexity is non-trivial. Addressing this issue is the focus of this paper. We identify when and why avoiding transient loops might require a significant number of messages in a centralized routing system, and demonstrate that this is the case under many common failure scenarios. We also establish that minimizing the number of required messages is NP-hard, and propose a greedy heuristic that we show to perform well under many conditions. The paper’s results can facilitate the deployment and evaluation of centralized architectures by leveraging their strengths without incurring unacceptable overhead.
Distributed routing is hard. There are too many problems with it. Not to solve them is a “revolutionary clean slate” (henceforth, all quotes are from the reviewers’ comments). For example, let routing be no longer distributed, but again centralized -- 4D, for example. Routers no longer route, they just forward. A separate, centralized routing systems computes all the routes, translates them into FIBs, and uploads the latter to forwarders, reborn routers. Unfortunately, many problems remain, they just grow some “wrinkles”. One of such problems, characteristic of any routing, “centralized or not”, is transient loops and associated communication complexity, i.e., the number of control messages per network topology change. This paper deals with the “wrinkles” of this problem in the centralized settings.
Specifically, the problem formulation is as follows. For each destination, the forwarding paths in a steady state form a directed acyclic graph (DAG) containing all the forwarding elements (FEs) in the network. After a topology change, such DAG transforms into a new DAG. If the union of the old and new DAGs has cycles, then there exists at least one FE update ordering that would produce transient routing loops for the corresponding destination. Consider now the union, across all destinations, of the cycle parts present in the new DAGs. If this union has no its own cycles, which is the case, for example, for all single link or node failures, then the topological sorting of this union clearly yields an FE ordering that can be used to update each FE only once, for all destinations, and without forming any transient loops. However, if this union has cycles, then such ordering does not exist, and the task is to split all destinations into a minimum number of update groups, such that all FEs are updated once per group. The authors propose a simple greedy heuristic for this NP-hard problem.
All the four reviewers liked the paper. All four, however, believe that convergence time is an evaluation metric, which is more important, at least from the practical standpoint, than message complexity that the authors focus on. One reviewer finds the greedy approach too simplistic for this NP-hard problem: “Better heuristics are likely possible, but evaluation does not tell how far the algorithm is from the optimal solution” in cases when it can be found, e.g., for single link or node failures. The authors only “provide information about the reduction in the number of messages” compared to the naive approach, in which each FE is updated once per destination. Another reviewer expresses perhaps the most fundamental concern with the problem formulation itself. Why do “all FEs have to update their FIBs the same number of times as the number of destination groups”? Different FEs may have to be updated different numbers of times, thus reducing the overhead compared to the authors’ approach. In other words, “the approach in the paper is not a real message minimization approach. The restriction imposed by destination grouping is unnecessary.” (See “Loop-Free Updates of Forwarding Tables” by J. Fu, et al.).
In summary, the paper deals with one of the fundamental problems in routing, the minimization of convergence costs, this time in the centralized settings, and under the no-transient-loops constraint. The paper makes it clear that these costs cannot scale better than linearly with the network size, and proposes a simple, easy-to-implement heuristic to minimize them in cases when this scaling is worse than linear.