This document provides reports on the presentations at the SIGCOMM 2011 Conference, the annual conference of the ACM Special Interest Group on Data Communication (SIGCOMM).
This document provides reports on the presentations at the SIGCOMM 2011 Conference, the annual conference of the ACM Special Interest Group on Data Communication (SIGCOMM).
This editorial was motivated by a panel on the relationship between academia and industry at the SIGCOMM 2011 conference that was moderated by Bruce Davie. I can claim some familiarity with the topic having spent roughly ten years each in academia and industry during the last twenty years.
My thesis is that although industry can make incremental gains, real technical breakthroughs can only come from academia. However, to have any impact, these academic breakthroughs must be motivated, at some level, by a real-world problem and the proposed solutions should be feasible, even if implausible. Therefore, it is in the self-interest of industry to fund risky, longterm, curiosity-driven academic research rather than sure-shot, short-term, practical research with welldefined objectives. Symmetrically, it is in the self-interest of academic researchers to tackle real-world problems motivated by the problems faced in industry and propose reasonable solutions.
There are many underlying reasons why technological revolutions today can only come from academia. Perhaps the primary reason is that, unlike most industrial research labs of today (and I am purposely excluding the late, great, Bell Labs of yore), academia still supports long-term, curiosity-driven research. This is both risky and an inherently `wasteful’ use of time. Yet, this apparently wasteful work is the basis for many of today’s technologies, ranging from Google search to the World Wide Web to BSD Unix and Linux. On closer thought, this is not too surprising. Short-term, practical research requires the investigator to have well-defined goals. But revolutionary ideas cannot be reduced to bullet items on Powerpoint slides: they usually arise as unexpected outcomes of curiosity-driven research. Moreover, it takes time for ideas to mature and for the inevitable missteps to be detected and corrected. Industrial funding cycles of six months to a year are simply not set up to fund ideas whose maturation can take five or even ten years. In contrast, academic research built on the basis of academic tenure and unencumbered by the demands of the marketplace is the ideal locus for long-term work.
Long-term, curiosity-driven research alone does not lead to revolutions. It must go hand-in-hand with an atmosphere of intellectual openness and rigour. Ideas must be freely exchanged and freely shot down.
The latest work should be widely disseminated and incorporated into one’s thinking. This openness is antithetical to the dogma of `Intellectual Property’ by which most corporations are bound. Academia, thankfully, has mostly escaped from this intellectual prison. Moreover, industry is essentially incompatible with intellectual rigour: corporate researchers, by and large, cannot honestly comment on the quality of their own company’s products and services.
A third ingredient in the revolutionary mix is the need for intense thinking by a dedicated group of researchers. Hands-on academic research tends to be carried out by young graduate students (under the supervision of their advisors) who are unburdened by either responsibilities or by knowing that something just cannot be done. Given training and guidance, given challenging goals, and given a soul-searing passion to make a difference in the world, a mere handful of researchers can do what corporate legions cannot.
These three foundations of curiosity-driven research, intellectual openness, and intense thinking set academic research apart from the industrial research labs of today and are also the reason why the next technological revolution is likely to come from academia, not industry.
In the foregoing, I admit that I have painted a rather rosy picture of academic research. It is important to recognize, however, that the same conditions that lead to breakthrough research also are susceptible to abuse. The freedom to pursue long-term ideas unconstrained by the marketplace can also lead to work that is shoddy and intellectually dishonest. For instance, I believe that it may be intellectually honest for a researcher to make assumptions that do not match current technology, but it is intellectually dishonest to make assumptions that violate the laws of physics. In a past editorial, I have written in more depth about these assumptions, so I will not belabour the point. I will merely remark here that it is incumbent on academic researchers not to abuse their freedom.
A second inherent problem with academic research, especially in the field of computer networking, is that it is difficult, perhaps impossible, to do large-scale datadriven research. As a stark example, curiosity-driven work on ISP topology is impossible if ISPs sequester this data. Similarly, studying large-scale data centre topology is challenging when the largest data centre one can build in academia has only a few hundred servers.
Finally, academic research tends to be self-driven and sometimes far removed from real-world problems. These real-world problems, which are faced daily by industrial researchers, can be intellectually demanding and their solution can be highly impactful. Academic researchers would benefit from dialogue with industrial researchers in posing and solving such problems.
Given this context, the relationship between academia and industry becomes relatively clear. What academia has and industry needs is committed, focussed researchers and the potential for long-term, revolutionary work. What industry has and academia needs is exposure to real-world problems, large-scale data and systems, and funding. Therefore, it would be mutually beneficial for each party to contribute to the other. Here are a few specific suggestions how.
First, industry should fund academic research without demanding concrete deliverables and unnecessary constraints. Of course, the research (and, in particular, the research assumptions) should be adequately monitored. But the overall expectation should be that academic work would be curiosity-driven, open, and long-term.
Second, industry should try to expose academic researchers to fundamental real-world problems and put at their disposal the data that is needed for their solution. If necessary, academic researchers should be given access to large-scale systems to try out their solutions. This can be done without loss of intellectual property by having students and PIs visit industrial research labs as interns or during sabbaticals. It could also be done by having industrial researchers spend several weeks or months as visitors to university research labs.
Third, industry should spend resources not only on funding, but on internal resources to match the output of academic research (papers and prototypes) to their own needs (products and systems).
Fourth, academic researchers should choose research problems based not just on what is publishable, but (also) based on the potential for real-world impact. This would naturally turn them to problems faced by industry.
Fifth, academic researchers should ensure that their solutions are feasible, even if implausible. For instance, a wireless system for cognitive radio built on USRP boards is implausible but feasible. In contrast, a wireless system that assumes that all radio coverage areas are perfectly circular is neither plausible nor feasible. This distinction should be emphasized in the academic review of technical papers.
Finally, academic researchers should recognize the constraints under which industry operates and, to the extent possible, accommodate them. For instance, they should encourage students to take on internships, fight the inevitable battles with the university office of research to negotiate IP terms, and understand that their points of contact will change periodically due to the nature of corporate (re-)organizations.
The SIG can also help this interaction. Industry-academic fora such as the panel at SIGCOMM, industryspecific workshops, and industry desks at conferences allow academic researchers to interact with representatives from industry. SIGCOMM could have tutorials focussed on topics of current interest to industry. These two efforts would certainly make deep collaboration between academia and industry more likely.
I hope that these steps will move our community towards a future where academic research, though curiosity-driven, continues to drive real-world change because of its symbiotic relationship with industrial partners.
This editorial benefited from comments from Bruce Davie and Gail Chopiak.
Several traffic monitoring applications may benefit from the availability of efficient mechanisms for approximately tracking smoothed time averages rather than raw counts. This paper provides two contributions in this direction. First, our analysis of Time-decaying Bloom filters, formerly proposed data structures devised to perform approximate Exponentially Weighted Moving Averages on streaming data, reveals two major shortcomings: biased estimation when measurements are read in arbitrary time instants, and slow operation resulting from the need to periodically update all the filter's counters at once. We thus propose a new construction, called On-demand Time-decaying Bloom filter, which relies on a continuous-time operation to overcome the accuracy/performance limitations of the original window-based approach. Second, we show how this new technique can be exploited in thedesign of high performance stream-based monitoring applications, by developing VoIPSTREAM, a proof-of-concept real-time analysis version of a formerly proposed system for telemarketing call detection. Our validation results, carried out over real telephony data, show how VoIPSTREAM closely mimics the feature extraction process and traffic analysis techniques implemented in the offline system, at a significantly higher processing speed, and without requiring any storage of per-user call detail records.
The Internet has changed dramatically in recent years. In particular, the fundamental change has occurred in terms of who generates most of the content, the variety of applications used and the diverse ways normal users connect to the Internet. These factors have led to an explosion of the amount of user-specific meta-information that is required to access Internet content (e.g., email addresses, URLs, social graphs). In this paper we describe a foundational service for storing and sharing user-specific meta-information and describe how this new abstraction could be utilized in current and future applications.
About ten years ago, Bob Lucky asked me for a list of open research questions in networking. I didn't have a ready list and reacted it would be good to have one. This essay is my (long- belated) reply.
Should a new "platform" target a functionality-rich but complex and expensive design or instead opt for a bare-bone but cheaper one? This is a fundamental question with profound implications for the eventual success of any platform. A general answer is, however, elusive as it involves a complex trade-off between benefits and costs. The intent of this paper is to introduce an approach based on standard tools from the field of economics, which can offer some insight into this difficult question. We demonstrate its applicability by developing and solving a generic model that incorporates key interactions between platform stakeholders. The solution confirms that the "optimal" number of features a platform should offer strongly depends on variations in cost factors. More interestingly, it reveals a high sensitivity to small relative changes in those costs. The paper's contribution and motivation are in establishing the potential of such a cross-disciplinary approach for providing qualitative and quantitative insights into the complex question of platform design.
In June 2011 I participated on a panel on network neutrality hosted at the June cybersecurity meeting of the DHS/SRI Infosec Technology Transition Council (ITTC), where "experts and leaders from the government, private, financial, IT, venture capitalist,and academia and science sectors came together to address the problem of identity theft and related criminal activity on the Internet." I recently wrote up some of my thoughts on that panel, including what network neutrality has to do with cybersecurity.
I recently published this essay on CircleID on my thoughts on ICANN's recent decision to launch .XXX and the larger new gTLD program this year. Among other observations, I describe how .XXX marks a historical inflection point, where ICANN's board formally abandoned any responsibility to present an understanding of the ramifications of probable negative externalities ("harms") in setting its policies. That ICANN chose to relinquish this responsibility puts the U.S. government in the awkward position of trying to tighten the few inadequate controls that remain over ICANN, and leaves individual and responsible corporate citizens in the unenviable yet familiar position of bracing for the consequences.
This edition of CCR bears a dubious distinction of having no technical articles, only editorial content. This is not because no technical articles were submitted: in fact, there were 13 technical submissions. However, all of them were rejected by the Area Editors on the advice of the reviewers, a decision that I did express concern with, but could not, in good conscience, overturn.
One could ask: were all the papers so terrible? Certainly some papers were unacceptably bad and some were simply out of scope. However, the fate of most papers was to be judged to be not good enough to publish. Some submissions were too broad, others too narrow, many were too incremental, some too radical, and some were just not interesting enough. The opposite of a Procrustean bed, CCR has become a bed that no paper seems to fit!
This, by itself, would normally not cause me too much concern. However, I feel that this attitude has permeated our community at large. A similar spirit of harsh criticism is used to judge papers at SIGCOMM, MOBICOM, CoNEXT, and probably every other top-tier computer science conference. Reviewers seem only to want to find fault with papers, rather than appreciate insights despite inevitable errors and a lack of technical completeness.
I think that a few all-too-human foibles lie at the bottom of this hyper-critical attitude of paper reviewers. First, a subconscious desire to get one’s back: if my paper has been rejected from a venue due to sharp criticism, why not pay this back with sharp criticism of my own? Second, a desire to prove one’s expertise: if I can show that a paper is not perfect, that shows how clever I am. Third, a biased view of what papers in a particular area should look like: I’m the expert in my field, so I think I know what every paper in my field should look like! Finally, unrealistic expectations: I may not write perfect papers but I expect to read only perfect ones. I think I have a good understanding of the psychological basis of reviewer nitpicking because I too am guilty of these charges.
These subconscious attitudes are exacerbated by two other factors: a ballooning of reviewer workloads, and, with journals in computer science languishing in their roles, conference papers being held to archival standard. These factors force reviewers into looking for excuses to reject papers, adding momentum to the push towards perfection. As the quote from Voltaire shows, this has negative consequences.
One negative consequence is the stifling of innovation. Young researchers learn that to be successful in publishing in top-tier venues, it pays to stick to well-established areas of research, where reviewers cannot fault them in their assumptions, because these already appear in the published literature. Then, they scale the walls by adding epsilon to delta until the incrementality threshold is breached. This has an opportunity cost in that well-studied areas are further overstudied to the detriment of others.
A second negative consequence is that it turns some researchers off. They simply do not want to take part in a game where they cannot respect the winners or the system. This has an even greater opportunity cost.
How can we address this problem? As PC chairs and Area Editors, we need to set the right expectations with reviewers. No paper will be perfect: that is a given. We have to change our mental attitude from finding reasons to reject a paper to finding reasons to accept a paper. We will certainly be trying to do this from now on at CCR.
We can also remove the notion of a publication bar altogether. An online version of CCR, which will be coming some day, could easily accept all articles submitted to it. Editors and reviewers could rank papers and do public reviews and readers can judge whether or not to read a paper. This is already common practice in physics, using the Arxiv system.
Finally, I would urge readers to look within. As a reviewer of a paper, it is your duty to critique a paper and point out its flaws. But can you overlook minor flaws and find the greater good? In some cases, I hope your answer will be yes. And with this small change, the system will also change. One review at a time.
While computer networking is an exciting research field, we are far from having a clear understanding of the core concepts and questions that define our discipline. This position paper, a summary of a talk I gave at the CoNext’10 Student Workshop, captures my current frustrations and hopes about the field.
The development of new technology is driven by scientific research. The Internet, with its roots in the ARPANET and NSFNet, is no exception. Many of the fundamental, long-term improvements to the architecture, security, end-to-end protocols and management of the Internet originate in the related academic research communities. Even shorter-term, more commercially driven extensions are oftentimes derived from academic research. When interoperability is required, the IETF standardizes such new technology. Timely and relevant standardization benefits from continuous input and review from the academic research community.
For an individual researcher, it can however by quite puzzling how to begin to most effectively participate in the IETF and arguably to a much lesser degree in the IRTF. The interactions in the IETF are much different than those in academic conferences, and effective participation follows different rules. The goal of this document is to highlight such differences and provide a rough guideline that will hopefully enable researchers new to the IETF to become successful contributors more quickly.
Electronic social networks are a relatively new pervasive phenomenon that has changed the way in which we communicate and interact. They are now supporting new applications, leading to new trends and posing new challenges. The workshop titled ”Future of Social Networking: Experts from Industry and Academia” took place in Cambridge on November 18, 2010 to expose how the future of social networking may develop and be exploited in new technologies and systems. We provide a summary of this event and some observations on the key outcomes.
We argue that the biggest problem with the current Internet architecture is not a particular functional deficiency, but its inability to accommodate innovation. To address this problem we propose a minimal architectural “framework” in which comprehensive architectures can reside. The proposed Framework for Internet Innovation (FII) — which is derived from the simple observation that network interfaces should be extensible and abstract — allows for a diversity of architectures to coexist, communicate, and evolve. We demonstrate FII’s ability to accommodate diversity and evolution with a detailed examination of how information flows through the architecture and with a skeleton implementation of the relevant interfaces.
On February 10-12, 2011, CAIDA hosted the third Work- shop on Active Internet Measurements (AIMS-3) as part of our series of Internet Statistics and Metrics Analysis (ISMA) workshops. As with the previous two AIMS workshops, the goals were to further our understanding of the potential and limitations of active measurement research and infrastructure in the wide-area Internet, and to promote cooperative solutions and coordinated strategies to address future data needs of the network and security research communities. For three years, the workshop has fostered interdisciplinary conversation among researchers, operators, and government, focused on analysis of goals, means, and emerging issues in active Internet measurement projects. The first workshop emphasized discussion of existing hardware and software platforms for macroscopic measurement and mapping of Internet properties, in particular those related to cybersecurity. The second workshop included more performance evaluation and data-sharing approaches. This year we expanded the work- shop agenda to include active measurement topics of more recent interest: broadband performance; gauging IPv6 deployment; and measurement activities in international re- search networks.
Exhaustion of the Internet addressing authority’s (IANA) available IPv4 address space, which occurred in February 2011, is finally exerting exogenous pressure on network operators to begin to deploy IPv6. There are two possible outcomes from this transition. IPv6 may be widely adopted and embraced, causing many existing methods to measure and monitor the Internet to be ineffective. A second possibility is that IPv6 languishes, transition mechanisms fail, or performance suffers. Either scenario requires data, measurement, and analysis to inform technical, business, and policy decisions. We survey available data that have allowed limited tracking of IPv6 deployment thus far, describe additional types of data that would support better tracking, and offer a perspective on the challenging future of IPv6 evolution.
Twenty years ago, when I was still a graduate student, going online meant firing up a high-speed 1200 baud modem and typing text on a Z19 glass terminal to interact with my university’s VAX 11/780 server. Today, this seems quaint, if not downright archaic. Fast forwarding twenty years from now, it seems very likely that reading newspapers and magazines on paper will seem equally quaint, if not downright wasteful. It is clear that the question is when, not if, CCR goes completely online.
CCR today provides two types of content: editorials and technical articles. Both are selected to be relevant, novel, and timely. By going online only, we would certainly not give up these qualities. Instead, by not being tied to the print medium, we could publish articles as they were accepted, instead of waiting for a publication deadline. This would reduce the time-to-publication from the current 16 weeks to less than 10 weeks, making the content even more timely.
Freeing CCR from print has many other benefits. We could publish content that goes well beyond black-and-white print and graphics. For example, graphs and photographs in papers would no longer have to be black-and-white. But that is not all: it would be possible, for example, to publish professional- quality videos of paper presentations at the major SIG conferences. We could also publish and archive the software and data sets for accepted papers. Finally, it would allow registered users to receive alerts when relevant content was published. Imagine the benefits from getting a weekly update from CCR with pointers to freshly-published content that is directly relevant to your research!
These potential benefits can be achieved at little additional cost and using off-the-shelf technologies. They would, however, significantly change the CCR experience for SIG members. Therefore, before we plunge ahead, we’d like to know what you think. Do send your comments to me at: firstname.lastname@example.org
Many papers explain the drop of download performance when two TCP connections in opposite directions share a common bottleneck link by ACK compression, the phenomenon in which download ACKs arrive in bursts so that TCP self clocking breaks. Efficient mechanisms to cope with the performance problem exist and we do not consider proposing yet another solution. We rather thoroughly analyze the interactions between connections and show that actually ACK compression only arises in a perfectly symmetrical setup and it has little impact on performance. We provide a different explanation of the interactions—data pendulum, a core phenomenon that we analyze in this paper. In the data pendulum effect, data and ACK segments alternately fill only one of the link buffers (on the upload or download side) at a time, but almost never both of them. We analyze the effect in the case in which buffers are structured as arrays of bytes and derive an expression for the ratio between the download and upload throughput. Simulation results and measurements confirm our analysis and show how appropriate buffer sizing alleviates performance degradation. We also consider the case of buffers structured as arrays of packets and show that it amplifies the effects of data pendulum.
While analyzing CAIDA Internet traces of TCP traffic to detect instances of data reneging, we frequently observed seven misbehaviors in the generation of SACKs. These misbehaviors could result in a data sender mistakenly thinking data reneging occurred. With one misbehavior, the worst case could result in a data sender receiving a SACK for data that was transmitted but never received. This paper presents a methodology and its application to test a wide range of operating systems using TBIT to fingerprint which ones misbehave in each of the seven ways. Measuring the performance loss due to these misbehaviors is outside the scope of this study; the goal is to document the misbehaviors so they may be corrected. One can conclude that the handling of SACKs while simple in concept is complex to implement.
This paper presents the results of an investigation into the application flow control technique utilised by YouTube. We reveal and describe the basic properties of YouTube application flow control, which we term block sending, and show that it is widely used by YouTube servers. We also examine how the block sending algorithm interacts with the flow control provided by TCP and reveal that the block sending approach was responsible for over 40% of packet loss events in YouTube flows in a residential DSL dataset and the re- transmission of over 1% of all YouTube data sent after the application flow control began. We conclude by suggesting that changing YouTube block sending to be less bursty would improve the performance and reduce the bandwidth usage of YouTube video streams.
In low-power wireless networks, nodes need to duty cycle their radio transceivers to achieve a long system lifetime. Counter-intuitively, in such networks broadcast becomes expensive in terms of energy and bandwidth since all neighbors must be woken up to receive broadcast messages. We argue that there is a class of traffic for which broadcast is overkill: periodic redundant transmissions of semi-static information that is already known to all neighbors, such as neighbor and router advertisements. Our experiments show that such traffic can account for as much as 20% of the network power consumption. We argue that this calls for a new communication primitive and present politecast, a communication primitive that allows messages to be sent without explicitly waking neighbors up. We have built two systems based on politecast: a low-power wireless mobile toy and a full-scale low-power wireless network deployment in an art gallery and our experimental results show that politecast can provide up to a four-fold lifetime improvement over broadcast.
Virtualizing and sharing networked resources have become a growing trend that reshapes the computing and networking architectures. Embedding multiple virtual networks (VNs) on a shared substrate is a challenging problem on cloud computing platforms and large-scale sliceable network testbeds. In this paper we apply the Markov Random Walk (RW) model to rank a network node based on its resource and topological attributes. This novel topology-aware node ranking measure reflects the relative importance of the node. Using node ranking we devise two VN embedding algorithms. The first algorithm maps virtual nodes to substrate nodes according to their ranks, then embeds the virtual links between the mapped nodes by finding shortest paths with unsplittable paths and solving the multi-commodity flow problem with splittable paths. The second algorithm is a backtracking VN embedding algorithm based on breadth-first search, which embeds the virtual nodes and links during the same stage using node ranks. Extensive simulation experiments show that the topology-aware node rank is a better resource measure and the proposed RW-based algorithms increase the long-term average revenue and acceptance ratio compared to the existing embedding algorithms.
Research and promotion of next generation Internet have drawn attention of researchers in many countries. In USA, FIND initiative takes a clean-slate approach. In EU, EIFFEL think tank concludes that both clean slate and evolutionary approach are needed. While in China, researchers and the country are enthusiastic on the promotion and immediate deployment of IPv6 due to the imminent problem of IPv4 address exhaustion.
Since 2003, China launched a strategic programme called China Next Generation Internet (CNGI). China is expecting that Chinese industry is better positioned on future Internet technologies and services than it was for the first generation. Under the support of CNGI grant, China Education and Research Network (CERNET) started to build an IPv6- only network, i.e. CNGI-CERNET2. Currently it provides IPv6 access service for students and staff in many Chinese universities. In this article, we will introduce the CNGI programme, the architecture of CNGI-CERNET2, and some aspects of CNGI-CERNET2’s deployment and operation, such as transition, security, charging and roaming service etc.
The most widely used technique for IP geolocation consists in building a database to keep the mapping between IP blocks and a geographic location. Several databases are available and are frequently used by many services and web sites in the Internet. Contrary to widespread belief, geolocation databases are far from being as reliable as they claim. In this paper, we conduct a comparison of several current geolocation databases -both commercial and free- to have an insight of the limitations in their usability.
First, the vast majority of entries in the databases refer only to a few popular countries (e.g., U.S.). This creates an imbalance in the representation of countries across the IP blocks of the databases. Second, these entries do not reflect the original allocation of IP blocks, nor BGP announcements. In addition, we quantify the accuracy of geolocation databases on a large European ISP based on ground truth information. This is the first study using a ground truth showing that the overly fine granularity of database entries makes their accuracy worse, not better. Geolocation databases can claim country-level accuracy, but certainly not city-level.
In a virtualized infrastructure where physical resources are shared, a single physical server failure will terminate several virtual servers and crippling the virtual infrastructures which contained those virtual servers. In the worst case, more failures may cascade from overloading the remaining servers. To guarantee some level of reliability, each virtual infrastructure, at instantiation, should be augmented with backup virtual nodes and links that have sufficient capacities. This ensures that, when physical failures occur, sufficient computing resources are available and the virtual network topology is preserved. However, in doing so, the utilization of the physical infrastructure may be greatly reduced. This can be circumvented if backup resources are pooled and shared across multiple virtual infrastructures, and intelligently embedded in the physical infrastructure. These techniques can reduce the physical footprint of virtual backups while guaranteeing reliability.
What is, or ought to be, the goal of systems research? The answer to this question differs for academics and researchers in industry. Researchers in the industry usually work either directly or indirectly on a specific commercial project, and are therefore constrained to design and build a system that fits manifest needs. They do not need to worry about a goal beyond this somewhat narrow horizon. For instance, a researcher at Google may be given the task of building an efficient file system: higher level goals beyond this are meaningless to him or her. So, the ‘goal’ of systems research is more or less trivial in the industrial context.
Many academic researchers in the area, however, are less constrained. Lacking an immediate project to work on, they are often left wondering what set of issues to address.
One solution is to work with industrial partners to find relevant problems. However, although this results in problems that are well-defined, immediately applicable, and even publishable in the best conferences, it is not clear whether this is the true role of academia. Why should industrial research be carried out for free by academics, in effect subsidized by society? I think that academics may be inspired’ by industrial problems, but should set their sights higher.
Another easy path is to choose to work in a ‘hot’ area, as defined by the leaders in the community, or a funding agency (more often than not, these are identical). If DARPA declares technology X or Y to be its latest funding goal, it is not too hard to change ones path to be a researcher of flavour X or Y. This path has the attraction that it guarantees a certain level of funding as well as a community of fellow researchers. However, letting others decide the research program does not sound too appealing. It is not that far from industrial research, except that the person to be satisfied is a program manager or funding agency, instead of your boss.
I think academic researchers ought to seek their own path relatively unfettered by considerations of industrial projects or the whims of funding agencies. This, therefore, immediately brings up the question of what ought to be the goal of their work. Here are my thoughts.
I believe that systems research lies in bridging two ‘gaps’: the Problem Selection Gap and the Infrastructure-Device Gap. In a nutshell, the goal of systems research is to satisfy application requirements, as defined by the Problem Selection Gap, by putting together infrastructure from underlying devices, by solving the Infrastructure-Device Gap. Let me explain this next.
What is the Infrastructure-device gap? Systems research results in the creation of systems infrastructure. By infrastructure, I mean a system that is widely used and that serves to improve the daily lives of its users in some way. Think of it as the analogues of water and electricity. By that token, Automatic Teller Machines, Internet Search, airline reservation systems, and satellite remote sensing services are all instances of essential technological infrastructure.
Infrastructure is built by putting together devices. By devices, I actually mean sub-systems whose behaviour can be well-enough encapsulated to form building blocks for the next level of abstraction and complexity. For instance, from the perspective of a computer network researcher, a host is a single device. Yet, a host is a complex system in itself, with many hundreds of subsystems. So, the definition of device depends on the specific abstraction being considered, and I will take it to be self-evident, for the purpose of this discussion, what a device is.
An essential aspect of the composition of devices into infrastructure is that the infrastructure has properties that individual devices do not. Consider a RAID system, that provides fault tolerance properties far superior to that of an individual disk. The systems research here is to mask the problems of individual devices, that is, to compose the devices into a harmonious whole, whose group properties, such as functionality, reliability, availability, efficiency, scalability, flexibility etc. are superior to that of each device. This then, is at the heart of systems research: how to take devices, appropriately defined, and compose them to create emergent properties in an infrastructure. We judge the quality of the infrastructure by the level to which it meets its stated goals. Moreover, we can use a standard ‘bag of tricks’ (explained in the networks context in George Varghese’s superb book ‘Network Algorithmics’) to effect this composition.
Although satisfying, this definition of systems research leaves an important problem unresolved: how should one define the set of infrastructure properties in the first place. After all, for each set of desired properties, one can come up with a system design that best matches it. Are we to be resigned to a set of not just incompatible, but incomparable, system designs?
Here is where the Problem selection gap fits in. Systems are not built in a vacuum. They exist in a social context. In other words, systems are built for some purpose. In the context of industrial research, the purpose is the purpose of the corporation, and handed down to the researcher: ‘Thou Shalt Build a File System’, for instance. And along with this edict comes a statement of the performance, efficiency, and ‘ility’ goals for the system. In such situations, there is no choice of problem selection.
But what of the academic researcher? What are the characteristics of the infrastructure that the academic should seek to build? I believe that the answer is to look to the social context of academia. Universities are supported by the public at large in order to provide a venue for the solution of problems that afflict society at large. These are problems of health care, education, poverty, global warming, pollution, inner-city crime, and so on. As academics, it behooves us to do our bit to help society solve these problems. Therefore, I claim that as academics, we should choose one or more of these big problems, and then think of what type of system infrastructure can we build to either alleviate or solve it. This will naturally lead to a set of infrastructure requirements. In other words, there is no need to invent artificial problems to work on! There are enough real-world problems already. We only need to open our eyes.
The vast majority of research in wireless and mobile (WAM) networking falls in the MANET (Mobile Ad Hoc Network) category, where end-to-end paths are the norm. More recently, research has focused on a different Disruption Tolerant Network (DTN) paradigm, where end-to-end paths are the exception and intermediate nodes may store data while waiting for transfer opportunities towards the destination. Protocols developed for MANETs are generally not appropriate for DTNs and vice versa, since the connectivity assumptions are so different. We make the simple but powerful observation that MANETs and DTNs fit into a continuum that generalizes these two previously distinct categories. In this paper, building on this observation, we develop a WAM continuum framework that goes further to scope the entire space of Wireless and Mobile networks so that a network can be characterized by its position in this continuum. Certain network equivalence classes can be defined over subsets of this WAM continuum. We instantiate our framework that allows network connectivity classification and show how that classification relates to routing. We illustrate our approach by applying it to networks described by traces and by mobility models. We also outline how our framework can be used to guide network design and operation.
Data collected using traceroute-based algorithms underpins research into the Internet’s router-level topology, though it is possible to infer false links from this data. One source of false inference is the combination of per-flow load-balancing, in which more than one path is active from a given source to destination, and classic traceroute, which varies the UDP destination port number or ICMP checksum of successive probe packets, which can cause per-flow load-balancers to treat successive packets as distinct flows and forward them along different paths. Consequently, successive probe packets can solicit responses from unconnected routers, leading to the inference of false links. This paper examines the inaccuracies induced from such false inferences, both on macroscopic and ISP topology mapping. We collected macroscopic topology data to 365k destinations, with techniques that both do and do not try to capture load balancing phenomena. We then use alias resolution techniques to infer if a measurement artifact of classic traceroute induces a false router-level link. This technique detected that 2.71% and 0.76% of the links in our UDP and ICMP graphs were falsely inferred due to the presence of load-balancing. We conclude that most per-flow load-balancing does not induce false links when macroscopic topology is inferred using classic traceroute. The effect of false links on ISP topology mapping is possibly much worse, because the degrees of a tier-1 ISP’s routers derived from classic traceroute were inflated by a median factor of 2.9 as compared to those inferred with Paris traceroute.
Recent research on Internet traffic classification has produced a number of approaches for distinguishing types of traffic. However, a rigorous comparison of such proposed algorithms still remains a challenge, since every proposal considers a different benchmark for its experimental evaluation. A lack of clear consensus on an objective and scientific way for comparing results has made researchers uncertain of fundamental as well as relative contributions and limitations of each proposal. In response to the growing necessity for an objective method of comparing traffic classifiers and to shed light on scientifically grounded traffic classification research, we introduce an Internet traffic classification benchmark tool, NeTraMark. Based on six design guidelines (Comparability, Reproducibility, Efficiency, Extensibility, Synergy, and Flexibility/Ease-of-use), NeTraMark is the first Internet traffic classification benchmark where eleven different state-of-the-art traffic classifiers are integrated. NeTraMark allows researchers and practitioners to easily extend it with new classification algorithms and compare them with other built-in classifiers, in terms of three categories of performance metrics: per-whole-trace flow accuracy, per-application flow accuracy, and computational performance.
Proliferation and innovation of wireless technologies require significant amounts of radio spectrum. Recent policy reforms by the FCC are paving the way by freeing up spectrum for a new generation of frequency-agile wireless devices based on software defined radios (SDRs). But despite recent advances in SDR hardware, research on SDR MAC protocols or applications requires an experimental platform for managing physical access. We introduce Papyrus, a software platform for wireless researchers to develop and experiment dynamic spectrum systems using currently available SDR hardware. Papyrus provides two fundamental building blocks at the physical layer: flexible non-contiguous frequency access and simple and robust frequency detection. Papyrus allows researchers to deploy and experiment new MAC protocols and applications on USRP GNU Radio, and can also be ported to other SDR platforms. We demonstrate the use of Papyrus using Jello, a distributedMAC overlay for high-bandwidth media streaming applications and Ganache, a SDR layer for adaptable guardband configuration. Full implementations of Papyrus and Jello are publicly available.
This paper performs controlled experiments with two popular virtualization techniques, Linux-VServer and Xen, to examine the effects of virtualization on packet sending and receiving delays. Using a controlled setting allows us to independently investigate the influence on delay measurements when competing virtual machines (VMs) perform tasks that consume CPU, memory, I/O, hard disk, and network bandwidth. Our results indicate that heavy network usage from competing VMs can introduce delays as high as 100 ms to round-trip times. Furthermore, virtualization adds most of this delay when sending packets, whereas packet reception introduces little extra delay. Based on our findings, we discuss guidelines and propose a feedback mechanism to avoid measurement bias under virtualization.
Scalability is said to be one of the major advantages brought by the cloud paradigm and, more specifically, the one that makes it different to an “advanced outsourcing” solution. However, there are some important pending issues before making the dreamed automated scaling for applications come true. In this paper, the most notable initiatives towards whole application scalability in cloud environments are presented. We present relevant efforts at the edge of state of the art technology, providing an encompassing overview of the trends they each follow. We also highlight pending challenges that will likely be addressed in new research efforts and present an ideal scalable cloud system.
We are pleased to announce the release of a tool that records detailed measurements of the wireless channel along with received 802.11 packet traces. It runs on a commodity 802.11n NIC, and records Channel State Information (CSI) based on the 802.11 standard. Unlike Receive Signal Strength Indicator (RSSI) values, which merely capture the total power received at the listener, the CSI contains information about the channel between sender and receiver at the level of individual data subcarriers, for each pair of transmit and receive antennas.
Our toolkit uses the Intel WiFi Link 5300 wireless NIC with 3 antennas. It works on up-to-date Linux operating systems: in our testbed we use Ubuntu 10.04 LTS with the 2.6.36 kernel. The measurement setup comprises our customized versions of Intel’s closesource firmware and open-source iwlwifi wireless driver, userspace tools to enable these measurements, access point functionality for controlling both ends of the link, and Matlab (or Octave) scripts for data analysis. We are releasing the binary of the modified firmware, and the source code to all the other components.
Research on networks for challenged environments has become a major research area recently. There is however a lack of true understanding among networking researchers about what such environments really are like. In this paper we give an introduction to the ExtremeCom series of work- shops that were created to overcome this limitation. We will discuss the motivation behind why the workshop series was created, give some summaries of the two workshops that have been held, and discuss the lessons that we have learned from them.
Accurate measurements of deployed wireless networks are vital for researchers to perform realistic evaluation of proposed systems. Unfortunately, the difficulty of performing detailed measurements limits the consistency in parameters and methodology of current datasets. Using different datasets, multiple research studies can arrive at conflicting conclusions about the performance of wireless systems. Correcting this situation requires consistent and comparable wireless traces collected from a variety of deployment environments. In this paper, we describe AirLab, a distributed wireless data collection infrastructure that uses uniformly instrumented measurement nodes at heterogeneous locations to collect consistent traces of both standardized and user-defined experiments. We identify four challenges in the AirLab platform, consistency, fidelity, privacy, security, and describe our approaches to address them.
This document collects together reports of the sessions from the 2010 ACM SIGCOMM Conference, the annual conference of the ACM Special Interest Group on Data Communication (SIGCOMM) on the applications, technologies, architectures, and protocols for computer communication.
In managing and troubleshooting home networks, one of the challenges is in knowing what is actually happening. Availability of a record of events that occurred on the home network before trouble appeared would go a long way toward addressing that challenge. In this position/work-in-progress paper, we consider requirements for a general-purpose logging facility for home networks. Such a facility, if properly designed, would potentially have other uses. We describe several such uses and discuss requirements to be considered in the design of a logging platform that would be widely supported and accepted. We also report on our initial deployment of such a facility.
HTTP (Hypertext Transport Protocol) was originally primarily used for human-initiated client-server communications launched from web browsers, traditional computers and laptops. However, today it has become the protocol of choice for a bewildering range of applications from a wide array of emerging devices like smart TVs and gaming consoles. This paper presents an initial study characterizing the non-traditional sources of HTTP traffic such as consumer devices and automated updates in the overall HTTP traffic for residential Internet users. Among our findings, 13% of all HTTP traffic in terms of bytes is due to nontraditional sources, with 5% being from consumer devices such as WiFi enabled smartphones and 8% generated from automated software updates and background processes. Our findings show that 11% of all HTTP requests are caused by communications with advertising servers from as many as 190 countries worldwide, suggesting the widespread prevalence of such activities. Overall, our findings start to answer questions about what is the state of traffic generated in these smart homes.
Data centers are a major consumer of electricity and a significant fraction of their energy use is devoted to cooling the data center. Recent prototype deployments have investigated the possibility of using outside air for cooling and have shown large potential savings in energy consumption. In this paper, we push this idea to the extreme, by running servers outside in Finnish winter. Our results show that commercial, off-the-shelf computer equipment can tolerate extreme conditions such as outside air temperatures below -20C and still function correctly over extended periods of time. Our experiment improves upon the other recent results by confirming their findings and extending them to cover a wider range of intake air temperatures and humidity. This paper presents our experimentation methodology and setup, and our main findings and observations.
Energy consumption is a major and costly problem in data centers. A large fraction of this energy goes to powering idle machines that are not doing any useful work. We identify two causes of this inefficiency: low server utilization and a lack of power-proportionality. To address this problem we present a design for an power-proportional cluster consisting of a power-aware cluster manager and a set of heterogeneous machines. Our design makes use of currently available energy-efficient hardware, mechanisms for transitioning in and out of low-power sleep states, and dynamic provisioning and scheduling to continually adjust to workload and minimize power consumption. With our design we are able to reduce energy consumption while maintaining acceptable response times for a web service workload based on Wikipedia. With our dynamic provisioning algorithms we demonstrate via simulation a 63% savings in power usage in a typically provisioned datacenter where all machines are left on and awake at all times. Our results show that we are able to achieve close to 90% of the savings a theoretically optimal provisioning scheme would achieve. We have also built a prototype cluster which runs Wikipedia to demonstrate the use of our design in a real environment.
Several powerful forces are gathering to make fundamental and irrevocable changes to the century-old grid. The next-generation grid, often called the ‘smart grid,’ will feature distributed energy production, vastly more storage, tens of millions of stochastic renewable-energy sources, and the use of communication technologies both to allow precise matching of supply to demand and to incentivize appropriate consumer behaviour. These changes will have the effect of reducing energy waste and reducing the carbon footprint of the grid, making it ‘smarter’ and ‘greener.’ In this position paper, we discuss how the concepts and techniques pioneered by the Internet, the fruit of four decades of research in this area, are directly applicable to the design of a smart, green grid. This is because both the Internet and the electrical grid are designed to meet fundamental needs, for information and for energy, respectively, by connecting geographically dispersed suppliers with geographically dispersed consumers. Keeping this and other similarities (and fundamental differences, as well) in mind, we propose several specific areas where Internet concepts and technologies can contribute to the development of a smart, green grid. We also describe some areas where the Internet operations can be improved based on the experience gained in the electrical grid. We hope that our work will initiate a dialogue between the Internet and the smart grid communities.