On-demand time-decaying bloom filters for telemarketer detection

Giuseppe Bianchi, Nico d'Heureuse, and Saverio Niccolini
Appears in: 
CCR October 2011

Several traffic monitoring applications may benefit from the availability of efficient mechanisms for approximately tracking smoothed time averages rather than raw counts. This paper provides two contributions in this direction. First, our analysis of Time-decaying Bloom filters, formerly proposed data structures devised to perform approximate Exponentially Weighted Moving Averages on streaming data, reveals two major shortcomings: biased estimation when measurements are read in arbitrary time instants, and slow operation resulting from the need to periodically update all the filter's counters at once. We thus propose a new construction, called On-demand Time-decaying Bloom filter, which relies on a continuous-time operation to overcome the accuracy/performance limitations of the original window-based approach. Second, we show how this new technique can be exploited in thedesign of high performance stream-based monitoring applications, by developing VoIPSTREAM, a proof-of-concept real-time analysis version of a formerly proposed system for telemarketing call detection. Our validation results, carried out over real telephony data, show how VoIPSTREAM closely mimics the feature extraction process and traffic analysis techniques implemented in the offline system, at a significantly higher processing speed, and without requiring any storage of per-user call detail records.

Public Review By: 
Augustin Chaintreau

This paper is about Bloom filters, and the goal of this review is to convince you that its presentation in CCR is a true positive. Here is why: No one (except perhaps a very lonely person) likes to receive calls from telemarketers, but neither do we like to have all our calls monitored and information stored in a profile in order to prevent such events. Sometimes it's against the law. Can we mitigate this tension with better data structures? That is the question that this paper addresses. More precisely it proposes to answer this problem using a variant of Time-decaying Bloom Filter that can be written using continuous time, which seems a priori simpler and turns out to address as well previous shortcomings. It also shows that this idea is practical: a comparison with a previous VoIP monitoring system shows that this design achieves similar performance with an online approach. Some questions remain open: How much can detection actually be improved, in particular w.r.t. the rate of decaying ? What are the exact memory and running time complexity in this online approach? and how can telemarketers take advantage of this data structure to cheat? All reviewers nevertheless felt that this problem is well motivated and explained, that the idea is compelling, and that the validation has a point. As the tension between privacy and need for better services is constantly under scrutiny in our networked world, we also believed that such approach might inspire more work on elegant and practical streaming algorithm.