A Generic Language for Application-Specific Flow Sampling

By: 
Harsha V. Madhyastha and Balachander Krishnamurthy
Appears in: 
CCR April 2008

Flow records gathered by routers provide valuable coarse-granularity traffic information for several measurement-related network applications. However, due to high volumes of traffic, flow records need to be sampled before they are gathered. Current techniques for producing sampled flow records are either focused on selecting flows from which statistical estimates of traffic volume can be inferred, or have simplistic models for applications. Such sampled flow records are not suitable for many applications with more specific needs, such as ones that make decisions across flows.

As a first step towards tailoring the sampling algorithm to an application’s needs, we design a generic language in which any particular application can express the classes of traffic of its interest. Our evaluation investigates the expressive power of our language, and whether flow records have sufficient information to enable sampling of records of relevance to applications. We use templates written in our custom language to instrument sampling tailored to three different applications—BLINC, Snort, and Bro. Our study, based on month-long datasets gathered at two different network locations, shows that by learning local traffic characteristics we can sample relevant flow records near-optimally with low false negatives in diverse applications.

Public Review By: 
Chadi Barakat

The monitoring of Internet traffic has several applications as anomaly detection, traffic classification, accounting and traffic engineering. These applications have different requirements in terms of the volume of information to be analyzed. Some of them are satisfied with a simple summation of the payload sizes of all or a subset of packets while others require a deep analysis of headers and payloads to detect if there is something going wrong. This is the main reason for which traffic monitoring applications resist differently to any generic application-unaware traffic sampling deployed in routers to reduce the volume of the collected traffic. The purpose of the present paper is that instead of generically sampling the traffic, resort to an application-specific sampling where the application specifies its requirements in a certain language format, then a sampler uses these requirements to only keep from the traffic the flow records that are of interest to the application. A generic language has been developed in this paper to this end and it was applied and validated on three applications: Snort, Bro and BLINC. The experimental results show that with this language, one is able to keep from a traffic trace almost the same volume of information relevant to these three applications.
The originality of this paper is in the generality of the proposed language and its ability to specify the needs of several traffic monitoring applications. This language could on one hand reduce the volume of collected traffic and on another hand ease the life of application developers. In the present work, the advantage of the application-specific sampling has been validated on flow records. A validation on packet records could highlight more features of the proposed solution. And a remaining open issue is certainly how one can leverage such adaptive application-aware sampling to reduce the overhead on routers both in the core of the network and at the edge.