Empirical Evaluation of Hash Functions for Multipoint Measurements

Christian Henke, Carsten Schmoll, and Tanja Zseby
Appears in: 
CCR July 2008

A broad spectrum of network measurement applications demand passive multipoint measurements in which data from multiple observation points has to be correlated. Examples are the passive measurement of one-way delay or the observation of the path that a packet takes through a network. Nevertheless, due to high data rates and the need for fine granular measurements, the resource consumption for passive measurements can be immense. Furthermore, the resource consumption depends on the traffic in the network, which usually is highly dynamic. Packet and ow-selection methods provide a solution to reduce and control the resource consumption for passive measurements. In order to apply such techniques to multipoint measurements the selection processes need to be synchronized. Hash-based selection is a deterministic packet selection based on a hash function computed on selected parts of the packet content. This selection decision is consistent throughout the network and enables packet tracing and the measurement of delay between network nodes. Because the selection is based on deterministic function it can introduce bias which leads to wrong estimation of traffic characteristics. In this paper we define a set of quality criteria and select methods to investigate which hash function is most suitable for hash-based packet selection. We analyze 23 non-cryptographic and 2 cryptographic hash functions. Experiments are performed with real traffic traces from different networks. Based on the results we recommend 2 fast hash functions which show low bias and sample a representative subset of the population.

Public Review By: 
Chadi Barakat

Traffic sampling is a key component in a passive measurement architecture especially when it comes to monitoring high speed networks. It has the advantage of reducing the volume of collected traffic, an important requirement for scalability. The main question stays however how to achieve this traffic reduction without impacting the accuracy of measurements and without harming the applications behind. This question becomes more difficult to answer when it comes to measurements carried out at different points inside the network. Randomness in sampling is known to reduce the bias but there is also a need to make measurement points collaborate together so that the same (randomly selected) traffic is not sampled several times at different points inside the network. Here comes the importance of hash functions (over random methods) which are the main subject of this paper. Indeed, one can use hash functions to (deterministically) select a subset of the traffic to measure and tune the hash functions of the different measurement points so that the total accuracy is improved and the bias is reduced.
Diverse functions have been proposed in the literature (BOB recommended by PSAMP, OAAT, CRC, etc). The present paper overviews these functions and propose an evaluation framework based on real traces and statistical tools.
The methods are compared with respect to their bias and speed and recommendations are made at the end of the paper. We believe that this paper, even though it does not bring any new hash function or sampling method, is worth reading for operators and researchers that want to work in this area. Its strength is in the number of hash functions it covers and the thorough experimental and statistical evaluation it proposes.