IP Geolocation with a Crowd-sourcing Broadband Performance Tool

Y. Lee, H. Park, Y. Lee
Appears in: 
CCR January 2016

In this paper, we propose an IP geolocation DB creation method based on a crowd-sourcing Internet broadband performance measurement tagged with locations and present an IP geolocation DB based on 7 years of Internet broadband performance data in Korea. Compared with other commercial IP geolocation DBs, our crowd-sourcing IP geolocation DB shows increased accuracy with fine-grained granularity. We confirm that the low accuracy of commercial IP geolocation DBs mainly results from selecting a single representative location for a large IP block from the Whois registry DB, parsing city names in a naive way, and resolving the wrong geolocation coordinates. We also found that the geographic location of IP blocks has continuously changed but has been stable. Although our IP geolocation DB is limited to Korea, the 32 million broadband performance test records over 7 years provide wide coverage as well as finegrained accuracy.

Public Review By: 
Fabian Bustamante

The geolocation of IP addresses – mapping an IP to the geographic location of the associated host - is important in a wide range of contexts, from targeted advertising to content distribution to (cyber)crime and punishment. As a result, a good number of IP geolocation services have been developed over the years, each building on either active measurements or passively collected datasets and generally trading off accuracy for scalability. The authors present a crowdsourced approach to get the accuracy of active measurement in a more scalable manner – they use a dataset from a Korean bandwidth testing utility that includes geo-location information to build a database and compare with two commercial alternatives. They go a bit further leveraging their data to point out some of the reasons for inaccuracies in the MaxMind and Akamai’s Edgescape services, such as parsing and translation problems with proper names, to over dependence on coarse-grain Whois entries and limited visibility. The reviewers had a number of comments on early versions of this paper, from presentation issues to the limited novelty in the work and high-level conclusions. A simple majority-based approach using crowd-collected information is not particularly new and neither is the fact that commercial services perform poorly outside a few major countries. Still, the reviewers found the work interesting, in the evaluation of geo-IP databases from a different locale, and valuable, if mostly in the size and long-term view of the dataset used in the analysis — 32 million records and 7 years of Internet broadband performance data from Korea.