Stefano Traverso

CrowdSurf: Empowering Transparency in the Web

By: 
Hassan Metwalley, Stefano Traverso, Marco Mellia (Politecnico di Torino), Stanislav Miskovic (Symantec Corp.), Mario Baldi (Politecnico di Torino)
Appears in: 
CCR October 2015

Individuals lack proper means to supervise the services they contact and the information they exchange when surfing the web. This security task has become challenging due to the complexity of the modern web, of the data delivering technology, and even to the adoption of encryption, which, while improving privacy, makes innetwork services ineffective. The implications are serious, from a person contacting undesired services or unwillingly exposing private information, to a company being unable to control the flow of its information to the outside world.

Public Review By: 
Joseph Camp

Users are often unaware of their sensitive information being sent to a third party when browsing the web. Hence, this paper proposes a system called CrowdSurf, which audits data flows from a browser and informs users of privacy concerns. To do so, regular expressions are used to determine if certain flows should be blocked, redirected, allowed, modified, or simply logged. CrowdSurf also allows users to contribute to a cloud-based system to infer which flows are harmful. CrowdSurf is evaluated on a live trace from an Internet Service Provider and shown to perform reliably with min- imal data required.

Temporal locality in today's content caching: why it matters and how to model it

By: 
Stefano Traverso, Mohamed Ahmed, Michele Garetto, Paolo Giaccone, Emilio Leonardi, Saverio Niccolini
Appears in: 
CCR October 2013
Public Review By: 
Augustin Chaintreau

“I’m not attracted and we don't get along, but the timing is right.” New Yorker Cartoon by Barbara Smaller If timing, as we are often reminded, is everything, it’s also a critical part of today’s caching system. The main reason is that the popularity of an item -- like a video on YouTube, as considered by the authors of this paper -- evolves quickly and it can easily fool a simplistic cache management strategy. In particular a heuristic model called independent reference is often used in evaluating caching networks even today, and this precisely states that popularity is fixed with time and that a new request is independent of the past. In reality requests typically occur in bursts, a phenomenon sometimes called temporal locality. How much of this locality matters in today’s caching? Which methods can be used in practice to extract and recognize it when it occurs? Are there models that retain some of the simplicity of the independent reference model while accounting for temporal locality? This paper answers the above questions by first conducting a longitudinal study of web-traffic inside a few operational Points of Presence (PoP) accounting for 60k users. It shows that temporal locality is present and that it generally implies that the performance derived for a simple caching scheme is unnecessarily pessimistic. It then makes two novel contributions: First, it proposes and evaluates a concrete method based on slicing and reshuffling to modulate temporal locality in a trace, showing how it directly impacts performance. Second, it builds a new temporal model of request generation that operates at a middle ground between mathematical simplicity and realism. The model reuses the above slicing and is inspired from Shot Noise Processes, a relatively mature tool of applied probability which allows one to sum independent processes that still exhibit themselves some temporal evolution. None of these tools and methods have ever been applied to caching for on-demand video or other content delivery. All reviewers appreciated the substance and the timing of this work to propose a different model of cache request. In spite of various classical results, we still rely too blindly on the independent reference model, owing to its simplicity. In addition, this work was quoted as a good example of applying sophisticated analysis to a real data set. Reviews also highlighted some important limitations of these current results: Relying on a single cache algorithm (LRU) for all claims makes the paper less relevant in practice and it reduces its surprise factor since LRU has well known deficiencies. This is exacerbated by the fact that only relatively small caches are considered, with associated low hit probability. It is a fact that the authors recognize and is dictated by the limitations of the traces themselves. Finally, one of the main advantages of this new model (over cascade-based requests etc.) is to permit analysis, but that point is not stricto sensu demonstrated here. For those interested, the authors cite a follow up paper, available but not currently published, that focuses on this point.We hope that our community will react to this work with a renewed interest for a more accurate model of caching requests, and that this would eventually narrow the gap between practical design and mathematical analysis for future networks.

Syndicate content