Nikolaos Papailiou

Datix: A System for Scalable Network Analytics

By: 
Dimitrios Sarlis, Nikolaos Papailiou, Ioannis Konstantinou (CSLAB, NTUA), Georgios Smaragdakis (MIT & TU Berlin), Nectarios Koziris (CSLAB, NTUA)
Appears in: 
CCR October 2015

The ever-increasing Internet traffic poses challenges to network operators and administrators that have to analyze large network datasets in a timely manner to make decisions regarding network routing, dimensioning, accountability and security. Network datasets collected at large networks such as Internet Service Providers (ISPs) or Internet Exchange Points (IXPs) can be in the order of Terabytes per hour. Unfortunately, most of the current network analysis approaches are ad-hoc and centralized, and thus not scalable.

Public Review By: 
Marco Mellia

Public Review for Datix: A System for Scalable Network Analytics Dimitrios Sarlis, Nikolaos Papailiou, Ioannis Konstantinou, Georgios Smaragdakis, and Nectarios Koziris Big Data is a hot topic, and the Internet is one of the few sources where it is possible to collect large amounts of data. It is not surprising then to see researchers trying to exploit Big Data techniques to analyze Internet data. This work goes in this direction, and applies Big Data methodologies to network monitoring and management. Authors propose Datix, a fully decentralized network traffic analytics engine for querying very large datasets using existing map-reduce infrastructures. The key contribution is the ability to do efficient distributed joins between network traffic data (in this case, SFlow packet samples) and metadata about fields in that data (e.g. IP to AS number mappings), a key primitive operation in many network traffic analysis studies. The data model is a star schema with a large log table and smaller dimension tables, which are partitioned by keys on load time. At runtime, queries are mapped to relevant partitions that contain the data, and the resulting query is passed to Shark or Hive for execution. The result is a fast and scalable system that results particularly suited for the analysis of network management traces. Reviewers found this paper to be interesting, well motivated, even if incremental. Despite the limited novelty of the proposed work, reviewers found Datix to be an important contribution, allowing existing infrastructure to be applied to very common network measurement tasks -- for which MapReduce is somewhat underutilized in practice. Plus, Datix is Open Source and available on GitHub.

Syndicate content