DRoP:DNS-based Router Positioning

B. Huffaker, M. Fomenkov, K. Claffy
Appears in: 
CCR July 2014

In this paper we focus on geolocating Internet routers, using a methodology for extracting and decoding geography-related strings from fully qualified domain names (hostnames). We first compiled an extensive dictionary associating geographic strings (e.g., airport codes) with geophysical locations. We then searched a large set of router hostnames for these strings, assuming each autonomous naming domain uses geographic hints consistently within that domain. We used topology and performance data continually collected by our global measurement infrastructure to discern whether a given hint appears to co-locate different hostnames in which it is found. Finally, we generalized geolocation hints into domain-specific rule sets. We generated a total of 1,711 rules covering 1,398 different domains and validated them using domain-specific ground truth we gathered for six domains. Unlike previous efforts which relied on labor-intensive domain-specific manual analysis, we automate our process for inferring the domain specific heuristics, substantially advancing the state-of-the-art of methods for geolocating Internet resources.

Public Review By: 
Joel Sommers

Identifying the geographic locations of Internet routers is an important and challenging issue for creating maps of network service provider points-of-presence and for identifying geographic characteristics of Internet routes. Since the work of Paxson [1], service provider DNS naming conventions have been used to infer geographic locations of Internet routers at the city level. Notably, this technique was used by Spring et al. in their influential Rocketfuel POP-level network mapping study [2]. Although there are a variety of pitfalls in exploiting DNS naming conventions, such as different service providers using different naming conventions, and the problem of disambiguating city names (e.g., does Cambridge refer to Massachusetts or the UK?), exploiting DNS naming conventions has nonetheless been widely used as a basis for router geolocation and inference of other router-level characteristics. In the present paper by Huffaker, Fomenkov and claffy, the authors develop an algorithm called DRoP for automatically deriving general geolocation rules from a set of DNS names found in, for example, a large corpus of traceroute measurements collected from a project like CAIDA’s Archipelago, which sends traceroute probes from a fixed set of vantage points to every /24 in the IPv4 Internet. The algorithm proceeds by extracting naming hints based on various naming conventions (e.g., airport codes, city names), using knowledge of various abbreviations, misspellings, and other complications. Latency and hop-count measurements are then compiled from the traceroute measurements and used as input to a classifier that determines whether a geographic hint is likely to be valid or not. The resulting set of valid hints are generalized to develop a smaller set of more abstract rules. The authors use DRoP on a recent set of data collected through CAIDA’s Archipelago project Kit and verify its inferences with operators from 6 different networks. The reviewers each thought that the proposed algorithm would be of significant use to the research community, especially if updated sets of naming rules are regularly generated and made publicly available. The use of latency and hop count measurements for validating naming inferences was viewed as a distinct and useful contribution. Since there are a variety of potential issues related to inferring information embedded in DNS names, the reviewers naturally questioned how the authors handled various pitfall-type situations in DRoP, such as names appearing in a variety of natural languages and how city name ambiguities were resolved, and each reviewer also wondered about the relatively small fraction of names that had discernible geographic hints and the underlying reason(s) for such limited availability of geographic information. Overall, DRoP's ultimate value may lie in its future integration with existing measurement data sets and tools provided by CAIDA, which should be of great benefit to the research community. [1] V. Paxson. Measurement and Analysis of End-to-end Internet Dynamics. PhD thesis, University of California at Berkeley, 1997. [2] N. Spring, R. Mahajan, and D. Wetherall. Measuring ISP topologies with Rocketfuel.