As a member of the SIGCOMM TPC, I recently had a chance to read over thirty submissions by the best and brightest in our field. Imagine my surprise to find that all but one of them had significant flaws in statistical analysis. These flaws were severe enough that in many other disciplines the papers would have been summarily rejected. Yet, a few of them not only were accepted but also are likely to become role models for the next generation of students. For, though statistically flawed, they were not far from common practice, so it was felt that they should not be unfairly punished.
In this editorial, I will focus on why statistical analysis matters, three common statistical errors I saw, why I think we have a rather relaxed approach to statistical analysis, and what we can do about it.
In conducting experiments, only the most trivial situations allow a comprehensive exploration of the underlying parameter space: simulations and measurements alike allow us to explore only a small portion of the space. One role of statistical analysis is guide the selection of the parameter space using techniques from experimental design.
A second role of statistical analysis is to allow a researcher to draw cautious and justifiable conclusions from a mass of numbers. Over a hundred years of work has created tried-and-tested techniques that allow researchers to compensate for unavoidable measurement errors, and to infer with high probability that the improvement seen due a particular algorithm or system is significant, rather than due to mere luck. Without statistical analysis, one is on thin ice.
These two roles of statistical analysis make it an essential underpinning for networking research, especially for experimental design, measurement, and performance analysis. Unfortunately, despite its importance, papers in our field--both submissions to SIGCOMM and published papers in similar top-tier conferences--suffer from severe statistical errors. The three most common errors I found were: (1) confusing a sample with a population and, as a corollary, not specifying the underlying population (2) not presenting confidence intervals for sample statistics and (3) incorrect hypothesis testing.
Most authors did not seem to realize that their measurements represented a sample from a much larger underlying population. Consider the measured throughput during ten runs between two nodes using a particular wireless protocol. These values are a sample of the population of node-to-node throughputs obtained using all possible uses of the protocol under all conceivable circumstances. Wireless performance may, however, vary widely depending on the RF environment. Therefore, the sample can be considered to be representative only if every likely circumstance has a proportionate chance of being represented. Authors need to strongly argue that the sample measurements are chosen in a way that sufficiently covers the underlying parameter space. Otherwise, the sample represents nothing more than itself! Yet, this test for scientific validity was rarely discussed in most papers.
Given that a sample is not the population, it is imperative that the statistics of a sample be presented along with a confidence interval in which, with high confidence, the population parameters lie. These are the familiar error bars in a typical graph or histogram. Lacking error bars, we cannot interpret the characteristics of the population with any precision; we can only draw conclusions about the sample, which is necessarily limited. To my surprise, only one paper I read had error bars. This is a serious flaw.
Finally, it is axiomatic in statistical analysis that a hypothesis cannot be proved; it can only rejected or not rejected. Hypothesis testing requires carefully framing a null hypothesis and using standard statistical analysis to either reject or not reject it. I realize that in many cases the null hypothesis is obvious and need not be formally stated. Nevertheless, it appeared that none of the papers I read had tested hypotheses with adequate care.
Why do papers in our field lack statistical rigor? I think that one reason could be that we teach statistics too early in the academic curriculum. Students who learn statistical inference and hypothesis testing as a chore in a freshman class have all but forgotten it by the time they are writing papers. In my own case, I am embarrassed to admit that I did not thoroughly understand these techniques until I recently wrote a chapter on statistical techniques for networking researchers. I suspect that many of my colleagues are in the same boat. Unfortunately, this makes our weakness self-perpetuating. Having forgotten statistical analysis, we are neither in a position to carry it out properly, nor do we insist upon it during peer review. Thus, we stumble from one flawed paper to the next, continuing the cycle.
What can we do about this? I suggest that all graduate students be required to take a course on statistical analysis. This need not be a formal course, but could be taken online or using distance education. The concepts are well known and the techniques are explained in numerous textbooks. We just need to buy into the agenda. Second, I think that we need to raise the bar during paper evaluation. Inadequate statistical analysis should be pointed out and should form one criterion for paper rejection. For papers with good experimental results but with poor statistical analysis, we should insist that these issues be rectified during shepherding. Finally, we need to educate the educators. Perhaps SIGCOMM can sponsor online or offline tutorials where researchers can quickly come up to speed in statistical analysis.
These steps will raise the scientific merit of our discipline, and, more importantly, prevent us from accepting incorrect results due to flaws in statistical analysis.
When data transfers to or from a host happen in parallel, users do not always consider them to have the same importance. Ideally, a transport protocol should therefore allow its users to manipulate the fairness among ows in an almost arbitrary fashion. Since data transfers can also include real-time media streams which need to keep delay - and hence buffers - small, the protocol should also have a smooth sending rate. In an effort to satisfy the above requirements, we present MulTFRC, a congestion control mechanism which is based on the TCP-friendly Rate Control (TFRC) protocol. It emulates the behavior of a number of TFRC ows while maintaining a smooth sending rate. Our simulations and a real-life test demonstrate that MulTFRC performs significantly better than its competitors, potentially making it applicable in a broader range of settings than what TFRC is normally associated with.
This paper presents a statistical analysis of the amount of information that the features of traffic flows observed at the packet-level carry, with respect to the protocol that generated them. We show that the amount of information of the majority of such features remain constant irrespective of the point of observation (Internet core vs. Internet edge) and to the capture time (year 2000/01 vs. year 2008). We also describe a comparative analysis of how four statistical classifiers fare using the features we studied.
The Internet is a primary engine of innovation, communication, and entertainment in our society. It is used by diverse and potentially conflicting interests. When these interests collide in the form of technology, should the Internet stay true to its roots and let things take their course naturally, or should regulation and legislation be enacted to control what may or may not be done. This author believes public opinion is best served by light regulation and a strong focus on allowing technological innovation to take its natural course.
Although research on algorithms and communication pro- tocols in Wireless Sensor Networks (WSN) has yielded a tremendous effort so far, most of these protocols are hardly used in real deployments nowadays. Several reasons have been put forward in recent publications. In this paper, we further investigate this trend from a Medium Access Control (MAC) perspective by analyzing both the reasons behind successful deployments and the characteristics of the MAC layers proposed in the literature. Although research on algorithms and communication protocols in Wireless Sensor Networks (WSN) has yielded a tremendous effort so far, most of these protocols are hardly used in real deployments nowadays. Several reasons have been put forward in recent publications. In this paper, we further investigate this trend from a Medium Access Control (MAC) perspective by analyzing both the reasons behind successful deployments and the characteristics of the MAC layers proposed in the literature. The effort allocated to develop suitable protocols from scratch every new deployment could however be minimized by using already existing contributions which provide code reuse and adaptive protocols. Though we advocate their use for nowadays deployments, we have identified several shortcomings in foreseen scenarios for which we provide guidelines for future researches.
In this paper we discuss the current status of the Global Environment for Network Innovations. Early prototypes of GENI are starting to come online as an end-to-end system and network researchers are invited to participate by engaging in the design process or using GENI to conduct experiments.
A US-JapanWorkshop on Future Networks was held in Palo Alto, CA on October 31 - November 1, 2008. This workshop brought together leading US and Japanese network researchers and network research infrastructure developers who are interested in future networks. The focus was on research issues and experimental infrastructure to support research on future generation networks. The goal was to foster cooperation and communication between peers in the two countries. Through this workshop, a number of research challenges were identified. The workshop also made recommendations to: create a new funding mechanism to foster special collaborations in networking and experimentation, extend current national testbeds with international connectivity; and urge the respective governments to exert strong leadership to ensure that collaborative research for creating future networks is carried out.
As the community develops an ever more diverse set of venues for disseminating and discussing technical work and as the traditional resource constraints change from pages and shelf space to reviewer time, the traditional prohibition against double submission deserves closer consideration. We discuss reasons why double submissions have been frowned upon, and try to establish a set of guidelines that ensure appropriate review and dissemination, without preventing informal early discussion of research projects.
Selecting a technical program for a conference, and running the process so that decisions are well received by authors and participants, is a challenging task. We report our experience in running the SIGCOMM 2009 Technical Program Committee (TPC). The purpose of this article is to document the process that we followed, and discuss it critically. This should let authors get a better understanding of what led to the final acceptance or rejection of their work, and hopefully let other
“Don’t even get me started” is a powerful and direct, yet vague and meaningless, expression. What if you reply: “By all means, please, get started”. I believe that this will stun your opponent (it may be a friendly discussion, but if you don’t believe the other person is an opponent, you will never win.). Then, you can quickly grab his/her nose and yell: ”Twos before eights, and one bird in the bush”. It can be very entertaining unless the other person is your boss or your psychiatrist. In any case, this column dares to go where no column at CCR has gone before. It is all about getting started. And by that, we mean starting start-ups.
Warning: the content below is not suitable for small children or adults with common sense.