On February 8-10, 2012, CAIDA hosted the fourth Workshop on Active Internet Measurements (AIMS-4) as part of our series of Internet Statistics and Metrics Analysis (ISMA) workshops. As with the previous three AIMS workshops, the goals were to further our understanding of the potential and limitations of active measurement research and infrastructure in the wide-area Internet, and to promote cooperative solutions and coordinated strategies to address future data needs of the network and security operations and research communities. This year we continued to focus on how measurement can illuminate two specific public policy concerns: IPv6 deployment and broadband performance. This report briefly describes topics discussed at this year's workshop. Slides and other materials related to the workshop are available at http://www.caida.org/.
This paper provides an interdisciplinary perspective concerning the role of prosumers on future Internet design based on the current trend of Internet user empowerment. The paper debates the prosumer role, and addresses models to develop a symmetric Internet architecture and supply-chain based on the integration of social capital aspects. It has as goal to ignite the discussion concerning a socially-driven Internet architectural design.
The late noughties have seen an influx of work in different scientific disciplines, all addressing the question of 'design' and 'architecture'. It is a battle between those advocating the theory of 'emergent properties' and others who strive for a 'theory for 'architecture'. We provide a particular insight into this battle, represented in the form of a story that focuses on the role of a possibly unusual protagonist and his influence on computer science, the Internet, architecture and beyond. We show his relation to one of the great achievements of system engineering, the Internet, and the possible future as it might unfold. Note from the writer: The tale is placed in a mixture of reality and fiction, while postulating a certain likelihood for this fiction. There is no proof for the assertions made in this tale, leaving the space for a sequel to be told.
Managing a home network is challenging because the underlying infrastructure is so complex. Existing interfaces either hide or expose the network's underlying complexity, but in both cases, the information that is shown does not necessarily allow a user to complete desired tasks. Recent advances in software defined networking, however, permit a redesign of the underlying network and protocols, potentially allowing designers to move complexity further from the user and, in some cases, eliminating it entirely. In this paper, we explore whether the choices of what to make visible to the user in the design of today's home network infrastructure, performance, and policies make sense. We also examine whether new capabilities for refactoring the network infrastructure - changing the underlying system without compromising existing functionality - should cause us to revisit some of these choices. Our work represents a case study of how co-designing an interface and its underlying infrastructure could ultimately improve interfaces for that infrastructure.
In Named Data Networking (NDN) architecture, packets carry data names rather than source or destination addresses. This change of paradigm leads to a new data plane: data consumers send out Interest packets, routers forward them and maintain the state of pending Interests, which is used to guide Data packets back to the consumers. NDN routers' forwarding process is able to detect network problems by observing the two-way traffic of Interest and Data packets, and explore multiple alternative paths without loops. This is in sharp contrast to today's IP forwarding process which follows a single path chosen by the routing process, with no adaptability of its own. In this paper we outline the design of NDN's adaptive forwarding, articulate its potential benefits, and identify open research issues.
As a networking researcher, working with computer networks day in and day out, you probably have rarely paused to reflect on the surprisingly difficult question of "What is a network?" For example, would you consider a bio-chemical system to be a network? How about a social network? Or a water supply network? Or the electrical grid? After all, all of these share some aspects in common with a computer network: they can be represented as a graph and they carry a flow (of chemical signals, messages, water, and electrons, respectively) from one or more sources to one or more destinations. So, shouldn't we make them equally objects of study by SIGCOMM members?
Desynchronization is useful for scheduling nodes to perform tasks at different time. This property is desirable for resource sharing, TDMA scheduling, and collision avoiding. Inspired by robotic circular formation, we propose DWARF (Desynchronization With an ARtificial Force field), a novel technique for desynchronization in wireless networks. Each neighboring node has artificial forces to repel other nodes to perform tasks at different time phases. Nodes with closer time phases have stronger forces to repel each other in the time domain. Each node adjusts its time phase proportionally to its received forces. Once the received forces are balanced, nodes are desynchronized. We evaluate our implementation of DWARF on TOSSIM, a simulator for wireless sensor networks. The simulation results indicate that DWARF incurs significantly lower desynchronization error and scales much better than existing approaches.
The IPv4 address space is quickly getting exhausted, putting a tremendous pressure on the adoption of even more NAT levels or IPv6. On the other hand, many authors propose the adoption of new Internet addressing capabilities, namely content-based addressing, to complement the existing IP host-based addressing. In this paper we propose the introduction of a location layer, between transport and network layers, to address both problems. We keep the existing IPv4 (or IPv6) host-based core routing functionalities, while we enable hosts to become routers between separate address spaces by exploring the new location header. For a proof of concept, we modified the TCP/IP stack of a Linux host to handle our new protocol layer and we designed and conceived a novel NAT box to enable current hosts to interact with the modified stack.
In many wireless systems, it is desirable to precede a data transmission with a handshake between the sender and the receiver. For example, RTS-CTS is a handshake that prevents collisions due to hidden terminals. Past work, however, has shown that the overhead of such handshake is too high for practical deployments. We present a new approach to wireless handshake that is almost overhead free. The key idea underlying the design is to separate a packet's PLCP header and MAC header from its body and have the sender and receiver first exchange the data and ACK headers, then exchange the bodies of the data and ACK packets without additional headers. The header exchange provides a natural handshake at almost no extra cost. We empirically evaluate the feasibility of such lightweight handshake and some of its applications. Our testbed evaluation shows that header-payload separation does not hamper packet decodabilty. It also shows that a light handshake enables hidden terminals, i.e., nodes that interfere with each other without RTS/CTS, to experience less than 4% of collisions. Furthermore, it improves the accuracy of bit rate selection in bursty and mobile environments producing a throughput gain of about 2x.
Internet services are often deployed in multiple (tens to hundreds) of geographically distributed data centers. They rely on Global Traffic Management (GTM) solutions to direct clients to the optimal data center based on a number of criteria like network performance, geographic location, availability, etc. The GTM solutions, however, have a fundamental design limitation in their ability to accurately map clients to data centers - they use the IP address of the local DNS resolver (LDNS) used by a client as a proxy for the true client identity, which in some cases causes suboptimal performance. This issue is known as the client-LDNS mismatch problem. We argue that recent proposals to address the problem suffer from serious limitations. We then propose a simple new solution, named ``FQDN extension'', which can solve the client-LDNS mismatch problem completely. We build a prototype system and demonstrate the effectiveness of the proposed solution. Using JavaScript, the solution can be deployed immediately for some online services, such as Web search, without modifying either client or local resolver.
This paper introduces libtrace, an open-source software library for reading and writing network packet traces. Libtrace offers performance and usability enhancements compared to other libraries that are currently used. We describe the main features of libtrace and demonstrate how the libtrace programming API enables users to easily develop portable trace analysis tools without needing to consider the details of the capture format, file compression or intermediate protocol headers. We compare the performance of libtrace against other trace processing libraries to show that libtrace offers the best compromise between development effort and program run time. As a result, we conclude that libtrace is a valuable contribution to the passive measurement community that will aid the development of better and more reliable trace analysis and network monitoring tools.
Correctness of the Chord ring-maintenance protocol would mean that the protocol can eventually repair all disruptions in the ring structure, given ample time and no further disruptions while it is working. In other words, it is "eventual reachability." Under the same assumptions about failure behavior as made in the Chord papers, no published version of Chord is correct. This result is based on modeling the protocol in Alloy and analyzing it with the Alloy Analyzer. By combining the right selection of pseudocode and textual hints from several papers, and fixing flaws revealed by analysis, it is possible to get a version that may be correct. The paper also discusses the significance of these results, describes briefly how Alloy is used to model and reason about Chord, and compares Alloy analysis to model-checking.
In spite of the tremendous amount of measurement efforts on understanding the Internet as a global system, little is known about the 'local' Internet (among ISPs inside a region or a country) due to limitations of the existing measurement tools and scarce data. In this paper, empirical in nature, we characterize the evolution of one such ecosystem of local ISPs by studying the interactions between ISPs happening at the Slovak Internet eXchange (SIX). By crawling the web archive waybackmachine.org we collect 158 snapshots (spanning 14 years) of the SIX website, with the relevant data that allows us to study the dynamics of the Slovak ISPs in terms of: the local ISP peering, the traffic distribution, the port capacity/utilization and the local AS-level traffic matrix. Examining our data revealed a number of invariant and dynamic properties of the studied ecosystem that we report in detail.
Traditional traffic engineering adapts the routing of traffic within the network to maximize performance. We propose a new approach that also adaptively changes where traffic enters and leaves the network—changing the “traffic matrix”, and not just the intradomain routing configuration. Our approach does not affect traffic patterns and BGP routes seen in neighboring networks, unlike conventional inter-domain traffic engineering where changes in BGP policies shift traf-
There are many deployed approaches for blocking unwanted traffic, either once it reaches the recipient's network, or closer to its point of origin. One of these schemes is based on the notion of traffic carrying capabilities that grant access to a network and/or end host. However, leveraging capabilities results in added complexity and additional steps in the communication process: Before communication starts a remote host must be vetted and given a capability to use in the subsequent communication. In this paper, we propose a lightweight mechanism that turns the answers provided by DNS name resolution - which Internet communication broadly depends on anyway - into capabilities. While not achieving an ideal capability system, we show the mechanism can be built from commodity technology and is therefore a pragmatic way to gain some of the key benefits of capabilities without requiring new infrastructure.
Operators of high-profile DNS zones utilize multiple authority servers for performance and robustness. We conducted a series of trace-driven measurements to understand how current caching resolver implementations distribute queries among a set of authority servers. Our results reveal areas for improvement in the ``apparently sound'' server selection schemes used by some popular implementations. In some cases, the selection schemes lead to sub-optimal behavior of caching resolvers, e.g. sending a significant amount of queries to unresponsive servers. We believe that most of these issues are caused by careless implementations, such as keeping decreasing a server's SRTT after the server has been selected, treating unresponsive servers as responsive ones, and using constant SRTT decaying factor. For the problems identified in this work, we recommended corresponding solutions.
Operators have deployed Multiprotocol Label Switching (MPLS) in the Internet for over a decade. However, its impact on Internet topology measurements is not well known, and it is possible for some MPLS configurations to lead to false router-level links in maps derived from traceroute data. In this paper, we introduce a measurement-based classification of MPLS tunnels, identifying tunnels where IP hops are revealed but not explicitly tagged as label switching routers, as well as tunnels that obscure the underlying path. Using a large-scale dataset we collected, we show that paths frequently cross MPLS tunnels in today's Internet: in our data, at least 30% of the paths we tested traverse an MPLS tunnel. We also propose and evaluate several methods to reveal MPLS tunnels that are not explicitly flagged as such: we discover that their fraction is significant (up to half the explicit tunnel quantity) but most of them do not obscure IP-level topology discovery.
People everywhere are generating ever-increasing amounts of data, often without being fully aware of who is recording what about them. For example, initiatives such as mandated smart metering, expected to be widely deployed in the UK in the next few years and already attempted in countries such as the Netherlands, will generate vast quantities of detailed, personal data about huge segments of the population. Neither the impact nor the potential of this society-wide data gathering are well understood. Once data is gathered, it will be processed -- and society is only now beginning to grapple with the consequences for privacy, both legal and ethical, of these actions, e.g., Brown et al. There is the potential for great harm through, e.g., invasion of privacy; but also the potential for great benefits by using this data to make more efficient use of resources, as well as releasing its vast economic potential. In this editorial we briefly discuss work in this area, the challenges still faced, and some potential avenues for addressing them.
Time tends to pass more quickly than we would like. Sometimes it is helpful to reflect on what you have accomplished, and to derive what you have learned from the experiences. These "lessons learned" may then be leveraged by yourself or others in the future. Occasionally, an external event will motivate this self reflection. For me, it was the 50th anniversary reunion of the St. Walburg Eagles, held in July 2011. The Eagles are a full-contact (ice) hockey team I played with between 1988 and 1996 (the Eagles ceased operations twice during this period, which limited me to four seasons playing with them), while attending university. What would I tell my friends and former teammates that I had been doing for the past 15 years? After some thought, I realized that my time as an Eagle had prepared me for a research career, in ways I would never have imagined. This article (an extended version with color photos is available in [1]) shares some of these similarities, to motivate others to reflect on their own careers and achievements, and perhaps make proactive changes as a result.
The Internet is not a Universal service, but then neither is democracy. So should the Internet be viewed as a right? It's certainly sometimes wrong. In this brief article, we depend on the Internet to reach our readers, and we hope that they don't object our doing that.
It has become a truism that innovation in the information and communications technology (ICT) fields is occurring faster than ever before. This paper posits that successful innovation requires three essential elements: a need, know-how or knowledge, and favorable economics. The paper examines this proposition by considering three technical areas in which there has been significant innovation in recent years: server virtualization and the cloud, mobile application optimization, and mobile speech services. An understanding of the elements that contribute to successful innovation is valuable to anyone that does either fundamental or applied research in fields of information and communication technology.
The second Workshop on Internet Economics [2], hosted by CAIDA and Georgia Institute of Technology on December 1-2, 2011, brought together network technology and policy researchers with providers of commercial Internet facilities and services (network operators) to further explore the common objective of framing an agenda for the emerging but empirically stunted field of Internet infrastructure economics. This report describes the workshop discussions and presents relevant open research questions identified by its participants.
I'd like to devote this editorial to a description of the process we use to select and publish papers submitted to CCR. CCR publishes two types of papers: technical papers and editorials. I'll first describe the process for technical papers then for editorials.
Common Wireless LAN (WLAN) pathologies include low signal-to-noise ratio, congestion, hidden terminals or interference from non-802.11 devices and phenomena. Prior work has focused on the detection and diagnosis of such problems using layer-2 information from 802.11 devices and special purpose access points and monitors, which may not be generally available. Here, we investigate a user-level approach: is it possible to detect and diagnose 802.11 pathologies with strictly user-level active probing, without any cooperation from, and without any visibility in, layer-2 devices? In this paper, we present preliminary but promising results indicating that such diagnostics are feasible.
Internet traffic has Zipf-like properties at multiple aggregation levels. These properties suggest the possibility of offloading most of the traffic from a complex controller (e.g., a software router) to a simple forwarder (e.g., a commodity switch), by letting the forwarder handle a very limited set of flows; the heavy hitters. As the volume of traffic from a set of flows is highly dynamic, maintaining a reliable set of heavy hitters over time is challenging. This is especially true when we face a volume limit in the non-offloaded traffic in combination with a constraint in the size of the heavy hitter set or its rate of change. We propose a set selection strategy that takes advantage of the properties of heavy hitters at different time scales. Based on real Internet traffic traces, we show that our strategy is able to offload most of the traffic while limiting the rate of change of the heavy hitter set, suggesting the feasibility of alternative router designs.
We demonstrate that the Internet has a formula linking demand, capacity and performance that in many ways is the analogue of the Erlang loss formula of telephony. Surprisingly, this formula is none other than the Erlang delay formula. It provides an upper bound on the probability a flow of given peak rate suffers degradation when bandwidth sharing is max-min fair. Apart from the flow rate, the only relevant parameters are link capacity and overall demand. We explain why this result is valid under a very general and realistic traffic model and discuss its significance for network engineering.
Unsolicited one-way Internet traffic, also called Internet background radiation (IBR), has been used for years to study malicious activity on the Internet, including worms, DoS attacks, and scanning address space looking for vulnerabilities to exploit. We show how such traffic can also be used to analyze macroscopic Internet events that are unrelated to malware. We examine two phenomena: country-level censorship of Internet communications described in recent work, and natural disasters (two recent earthquakes). We introduce a new metric of local IBR activity based on the number of unique IP addresses per hour contributing to IBR. The advantage of this metric is that it is not affected by bursts of traffic from a few hosts. Although we have only scratched the surface, we are convinced that IBR traffic is an important building block for comprehensive monitoring, analysis, and possibly even detection of events unrelated to the IBR itself. In particular, IBR offers the opportunity to monitor the impact of events such as natural disasters on network infrastructure, and in particular reveals a view of events that is complementary to many existing measurement platforms based on (BGP) control-plane views or targeted active ICMP probing.
Researchers studying the interdomain routing system, its properties and new protocols, face many challenges in performing realistic evaluations and simulations. Modeling decisions with respect to AS-level topology, routing policies and traffic matrices are complicated by a scarcity of ground truth for each of these components. Moreover, scalability issues arise when attempting to simulate over large (although still incomplete) empirically-derived AS-level topologies. In this paper, we discuss our approach for analyzing the robustness of our results to incomplete empirical data. We do this by (1) developing fast simulation algorithms that enable us to (2) running multiple simulations with varied parameters that test the sensitivity of our research results.
Long-term historical analysis of captured network traffic is a topic of great interest in network monitoring and network security. A critical requirement is the support for fast discovery of packets that satisfy certain criteria within large-scale packet repositories. This work presents the first indexing scheme for network packet traces based on compressed bitmap indexing principles. Our approach supports very fast insertion rates and results in compact index sizes. The proposed indexing methodology builds upon libpcap, the de-facto reference library for accessing packet-trace repositories. Our solution is therefore backward compatible with any solution that uses the original library. We experience impressive speedups on packet-trace search operations: our experiments suggest that the index-enabled libpcap may reduce the packet retrieval time by more than 1100 times.
We develop a holistic cost model that operators can use to help evaluate the costs of various routing and peering decisions. Using real traffic data from a large carrier network, we show how network operators can use this cost model to significantly reduce the cost of carrying traffic in their networks. We find that adjusting the routing for a small fraction of total flows (and total traffic volume) significantly reduces cost in many cases. We also show how operators can use the cost model both to evaluate potential peering arrangements and for other network operations problems.
ADSL and cable connections are the prevalent technologies available from Internet Service Providers (ISPs) for residential Internet access. Asymmetric access technologies such as these offer high download capacity, but moderate upload capacity. When the Transmission Control Protocol (TCP) is used on such access networks, performance degradation can occur. In particular, sharing a bottleneck link with different upstream and downstream capacities among competing TCP flows in opposite directions can degrade the throughput of the higher speed link. Despite many research efforts to solve this problem in the past, there is no solution that is both highly effective and easily deployable in residential networks. In this paper, we propose an Asymmetric Queueing (AQ) mechanism that enables full utilization of the bottleneck access link in residential networks with asymmetric capacities. The extensive simulation evaluation of our design shows its effectiveness and robustness in a variety of network conditions. Furthermore, our solution is easy to deploy and configure in residential networks.
Distributed denial of service attacks are often considered just a security problem. While this may be the way to view the problem with the Internet of today, perhaps new network architectures attempting to address the issue should view it as a scalability problem. In addition, they may need to approach the problem based on a rigorous foundation.
This document provides reports on the presentations at the SIGCOMM 2011 Conference, the annual conference of the ACM Special Interest Group on Data Communication (SIGCOMM).
This editorial was motivated by a panel on the relationship between academia and industry at the SIGCOMM 2011 conference that was moderated by Bruce Davie. I can claim some familiarity with the topic having spent roughly ten years each in academia and industry during the last twenty years.
My thesis is that although industry can make incremental gains, real technical breakthroughs can only come from academia. However, to have any impact, these academic breakthroughs must be motivated, at some level, by a real-world problem and the proposed solutions should be feasible, even if implausible. Therefore, it is in the self-interest of industry to fund risky, longterm, curiosity-driven academic research rather than sure-shot, short-term, practical research with welldefined objectives. Symmetrically, it is in the self-interest of academic researchers to tackle real-world problems motivated by the problems faced in industry and propose reasonable solutions.
There are many underlying reasons why technological revolutions today can only come from academia. Perhaps the primary reason is that, unlike most industrial research labs of today (and I am purposely excluding the late, great, Bell Labs of yore), academia still supports long-term, curiosity-driven research. This is both risky and an inherently `wasteful’ use of time. Yet, this apparently wasteful work is the basis for many of today’s technologies, ranging from Google search to the World Wide Web to BSD Unix and Linux. On closer thought, this is not too surprising. Short-term, practical research requires the investigator to have well-defined goals. But revolutionary ideas cannot be reduced to bullet items on Powerpoint slides: they usually arise as unexpected outcomes of curiosity-driven research. Moreover, it takes time for ideas to mature and for the inevitable missteps to be detected and corrected. Industrial funding cycles of six months to a year are simply not set up to fund ideas whose maturation can take five or even ten years. In contrast, academic research built on the basis of academic tenure and unencumbered by the demands of the marketplace is the ideal locus for long-term work.
Long-term, curiosity-driven research alone does not lead to revolutions. It must go hand-in-hand with an atmosphere of intellectual openness and rigour. Ideas must be freely exchanged and freely shot down.
The latest work should be widely disseminated and incorporated into one’s thinking. This openness is antithetical to the dogma of `Intellectual Property’ by which most corporations are bound. Academia, thankfully, has mostly escaped from this intellectual prison. Moreover, industry is essentially incompatible with intellectual rigour: corporate researchers, by and large, cannot honestly comment on the quality of their own company’s products and services.
A third ingredient in the revolutionary mix is the need for intense thinking by a dedicated group of researchers. Hands-on academic research tends to be carried out by young graduate students (under the supervision of their advisors) who are unburdened by either responsibilities or by knowing that something just cannot be done. Given training and guidance, given challenging goals, and given a soul-searing passion to make a difference in the world, a mere handful of researchers can do what corporate legions cannot.
These three foundations of curiosity-driven research, intellectual openness, and intense thinking set academic research apart from the industrial research labs of today and are also the reason why the next technological revolution is likely to come from academia, not industry.
In the foregoing, I admit that I have painted a rather rosy picture of academic research. It is important to recognize, however, that the same conditions that lead to breakthrough research also are susceptible to abuse. The freedom to pursue long-term ideas unconstrained by the marketplace can also lead to work that is shoddy and intellectually dishonest. For instance, I believe that it may be intellectually honest for a researcher to make assumptions that do not match current technology, but it is intellectually dishonest to make assumptions that violate the laws of physics. In a past editorial, I have written in more depth about these assumptions, so I will not belabour the point. I will merely remark here that it is incumbent on academic researchers not to abuse their freedom.
A second inherent problem with academic research, especially in the field of computer networking, is that it is difficult, perhaps impossible, to do large-scale datadriven research. As a stark example, curiosity-driven work on ISP topology is impossible if ISPs sequester this data. Similarly, studying large-scale data centre topology is challenging when the largest data centre one can build in academia has only a few hundred servers.
Finally, academic research tends to be self-driven and sometimes far removed from real-world problems. These real-world problems, which are faced daily by industrial researchers, can be intellectually demanding and their solution can be highly impactful. Academic researchers would benefit from dialogue with industrial researchers in posing and solving such problems.
Given this context, the relationship between academia and industry becomes relatively clear. What academia has and industry needs is committed, focussed researchers and the potential for long-term, revolutionary work. What industry has and academia needs is exposure to real-world problems, large-scale data and systems, and funding. Therefore, it would be mutually beneficial for each party to contribute to the other. Here are a few specific suggestions how.
First, industry should fund academic research without demanding concrete deliverables and unnecessary constraints. Of course, the research (and, in particular, the research assumptions) should be adequately monitored. But the overall expectation should be that academic work would be curiosity-driven, open, and long-term.
Second, industry should try to expose academic researchers to fundamental real-world problems and put at their disposal the data that is needed for their solution. If necessary, academic researchers should be given access to large-scale systems to try out their solutions. This can be done without loss of intellectual property by having students and PIs visit industrial research labs as interns or during sabbaticals. It could also be done by having industrial researchers spend several weeks or months as visitors to university research labs.
Third, industry should spend resources not only on funding, but on internal resources to match the output of academic research (papers and prototypes) to their own needs (products and systems).
Fourth, academic researchers should choose research problems based not just on what is publishable, but (also) based on the potential for real-world impact. This would naturally turn them to problems faced by industry.
Fifth, academic researchers should ensure that their solutions are feasible, even if implausible. For instance, a wireless system for cognitive radio built on USRP boards is implausible but feasible. In contrast, a wireless system that assumes that all radio coverage areas are perfectly circular is neither plausible nor feasible. This distinction should be emphasized in the academic review of technical papers.
Finally, academic researchers should recognize the constraints under which industry operates and, to the extent possible, accommodate them. For instance, they should encourage students to take on internships, fight the inevitable battles with the university office of research to negotiate IP terms, and understand that their points of contact will change periodically due to the nature of corporate (re-)organizations.
The SIG can also help this interaction. Industry-academic fora such as the panel at SIGCOMM, industryspecific workshops, and industry desks at conferences allow academic researchers to interact with representatives from industry. SIGCOMM could have tutorials focussed on topics of current interest to industry. These two efforts would certainly make deep collaboration between academia and industry more likely.
I hope that these steps will move our community towards a future where academic research, though curiosity-driven, continues to drive real-world change because of its symbiotic relationship with industrial partners.
This editorial benefited from comments from Bruce Davie and Gail Chopiak.
Several traffic monitoring applications may benefit from the availability of efficient mechanisms for approximately tracking smoothed time averages rather than raw counts. This paper provides two contributions in this direction. First, our analysis of Time-decaying Bloom filters, formerly proposed data structures devised to perform approximate Exponentially Weighted Moving Averages on streaming data, reveals two major shortcomings: biased estimation when measurements are read in arbitrary time instants, and slow operation resulting from the need to periodically update all the filter's counters at once. We thus propose a new construction, called On-demand Time-decaying Bloom filter, which relies on a continuous-time operation to overcome the accuracy/performance limitations of the original window-based approach. Second, we show how this new technique can be exploited in thedesign of high performance stream-based monitoring applications, by developing VoIPSTREAM, a proof-of-concept real-time analysis version of a formerly proposed system for telemarketing call detection. Our validation results, carried out over real telephony data, show how VoIPSTREAM closely mimics the feature extraction process and traffic analysis techniques implemented in the offline system, at a significantly higher processing speed, and without requiring any storage of per-user call detail records.
The Internet has changed dramatically in recent years. In particular, the fundamental change has occurred in terms of who generates most of the content, the variety of applications used and the diverse ways normal users connect to the Internet. These factors have led to an explosion of the amount of user-specific meta-information that is required to access Internet content (e.g., email addresses, URLs, social graphs). In this paper we describe a foundational service for storing and sharing user-specific meta-information and describe how this new abstraction could be utilized in current and future applications.
About ten years ago, Bob Lucky asked me for a list of open research questions in networking. I didn't have a ready list and reacted it would be good to have one. This essay is my (long- belated) reply.
Should a new "platform" target a functionality-rich but complex and expensive design or instead opt for a bare-bone but cheaper one? This is a fundamental question with profound implications for the eventual success of any platform. A general answer is, however, elusive as it involves a complex trade-off between benefits and costs. The intent of this paper is to introduce an approach based on standard tools from the field of economics, which can offer some insight into this difficult question. We demonstrate its applicability by developing and solving a generic model that incorporates key interactions between platform stakeholders. The solution confirms that the "optimal" number of features a platform should offer strongly depends on variations in cost factors. More interestingly, it reveals a high sensitivity to small relative changes in those costs. The paper's contribution and motivation are in establishing the potential of such a cross-disciplinary approach for providing qualitative and quantitative insights into the complex question of platform design.
In June 2011 I participated on a panel on network neutrality hosted at the June cybersecurity meeting of the DHS/SRI Infosec Technology Transition Council (ITTC), where "experts and leaders from the government, private, financial, IT, venture capitalist,and academia and science sectors came together to address the problem of identity theft and related criminal activity on the Internet." I recently wrote up some of my thoughts on that panel, including what network neutrality has to do with cybersecurity.