A Comparative Look into Public IXP Datasets

R. Kloti, B. Ager, V. Kotronis, G. Nomikos, X. Dimitropoulos
Appears in: 
CCR January 2016

Internet eXchange Points (IXPs) are core components of the Internet infrastructure where Internet Service Providers (ISPs) meet and exchange traffic. During the last few years, the number and size of IXPs have increased rapidly, driving the flattening and shortening of Internet paths. However, understanding the present status of the IXP ecosystem and its potential role in shaping the future Internet requires rigorous data about IXPs, their presence, status, participants, etc. In this work, we do the first cross-comparison of three well-known publicly available IXP databases, namely of PeeringDB, Euro-IX, and PCH. A key challenge we address is linking IXP identifiers across databases maintained by different organizations. We find different AS-centric versus IXP-centric views provided by the databases as a result of their data collection approaches. In addition, we highlight differences and similarities w.r.t. IXP participants, geographical coverage, and co-location facilities. As a side-product of our linkage heuristics, we make publicly available the union of the three databases, which includes 40.2 % more IXPs and 66.3 % more IXP participants than the commonly-used PeeringDB. We also publish our analysis code to foster reproducibility of our experiments and shed preliminary insights into the accuracy of the union dataset.

Public Review By: 
Fabián E. Bustamante

The topic of Internet eXchange Points (IXPs) has attracted growing interest in the last few years as our community becomes aware of their increasingly central role in the Internet, from interdomain connectivity and overall network structure to performance and economics. Anyone starting to look at the topic (there’s plenty to take on) quickly becomes aware of the dearth of publicly available information, the value of three main datasets – from PeeringDB, the European Internet Exchange Association (Euro-IX) and the Packet Clearing House (PCH), and the pain that dealing with them entails. For the intrepid, this article should ease some of that pain. The authors present the first analysis and detailed cross-comparison of these three public datasets. They analyze data from between 500 and 700 IXPs found in each of the different sources and highlight the similarities, complementary information and discrepancies they found. For those considering doing work in the area, the authors are also sharing the combined data as well as the code used for data collection and analysis. Not everything is resolved, of course, and there is still much pain to cope with. The reviewers point to the fact that the sources are far from rigorous, manual inspection is needed and question the ability of the code to handle discrepancies that exist between the different datasets. As the authors note, a fully automated approach to linking all IXPs may not be desirable given the many ambiguous mappings (one of my favorites – “SIX” occurs as an identifier with minor variations 5 times in PeeringDB with the “S” standing alternatively for Seattle, Stuttgart, Slovak, Slovenia and Stavanger, all of them different IXPs!). There is also potential linkage between these datasets to an extent not yet clear and the clearly difficult problem of getting to the elusive ground truth. All said, this is a good, very welcome step forward and the community should appreciate the effort.