An Internet census taken by an illegal botnet - A qualitative assessment of published measurements

By: 
T. Krenc, O. Hohlfeld, A. Feldmann
Appears in: 
CCR July 2014

On March 17, 2013, an Internet census data set and an accompanying report were released by an anonymous author or group of authors. It created an immediate media buzz, mainly because of the unorthodox and unethical data collection methodology (i.e., exploiting default passwords to form the Carna botnet), but also because of the alleged unprecedented large scale of this census (even though legitimate census studies of similar and even larger sizes have been performed in the past). Given the unknown source of this released data set, little is known about it. For example, can it be ruled out that the data is faked? Or if it is indeed real, what is the quality of the released data? The purpose of this paper is to shed light on these and related questions and put the contributions of this anonymous Internet census study into perspective. Indeed, our findings suggest that the released data set is real and not faked, but that the measurements suffer from a number of methodological flaws and also lack adequate meta-data information. As a result, we have not been able to verify several claims that the anonymous author(s) made in the published report. In the process, we use this study as an educational example for illustrating how to deal with a large data set of unknown quality, hint at pitfalls in Internet-scale measurement studies, and discuss ethical considerations concerning third-party use of this released data set for publications.