Results 1 - 10
of
91
A survey of techniques for internet traffic classification using machine learning
- Communications Surveys & Tutorials, IEEE
"... Abstract—The research community has begun looking for IP traffic classification techniques that do not rely on ‘well known’ TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification ..."
Abstract
-
Cited by 135 (4 self)
- Add to MetaCart
(Show Context)
Abstract—The research community has begun looking for IP traffic classification techniques that do not rely on ‘well known’ TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification and classification process. This survey paper looks at emerging research into the application of Machine Learning (ML) techniques to IP traffic classification- an inter-disciplinary blend of IP networking and data mining techniques. We provide context and motivation for the application of ML techniques to IP traffic classification, and review 18 significant works that cover the dominant period from 2004 to early 2007. These works are categorized and reviewed according to their choice of ML strategies and primary contributions to the literature. We also discuss a number of key requirements for the employment of ML-based traffic classifiers in operational IP networks, and qualitatively critique the extent to which the reviewed works meet these requirements. Open issues and challenges in the field are also discussed.
D.: Polyglot: automatic extraction of protocol message format using dynamic binary analysis
- In: CCS ’07: Proceedings of the 14th ACM conference on Computer and communications security
, 2007
"... Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work [17] has proposed protocol reverse engineering by using clustering on netwo ..."
Abstract
-
Cited by 109 (21 self)
- Add to MetaCart
(Show Context)
Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work [17] has proposed protocol reverse engineering by using clustering on network traces. That kind of approach is limited by the lack of semantic information on network traces. In this paper we propose a new approach using program binaries. Our approach, shadowing, uses dynamic analysis and is based on a unique intuition—the way that an implementation of the protocol processes the received application data reveals a wealth of information about the protocol message format. We have implemented our approach in a system called Polyglot and evaluated it extensively using real-world implementations of five different protocols: DNS, HTTP, IRC, Samba and ICQ. We compare our results with the manually crafted message format, included in Wireshark, one of the state-ofthe-art protocol analyzers. The differences we find are small and usually due to different implementations handling fields in different ways. Finding such differences between implementations is an added benefit, as they are important for problems such as fingerprint generation, fuzzing, and error detection.
Traffic Classification Using Clustering Algorithms
- Proceedings of the ACM SIGCOMM Workshop on Mining Network Data (MineNet
, 2006
"... Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiti ..."
Abstract
-
Cited by 95 (5 self)
- Add to MetaCart
(Show Context)
Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.
Dynamic Application-Layer Protocol Analysis for Network Intrusion Detection
- In Proc. Usenix Security Symp
, 2006
"... Many network intrusion detection systems (NIDS) rely on protocol-specific analyzers to extract the higher-level semantic context from a traffic stream. To select the cor-rect kind of analysis, traditional systems exclusively de-pend on well-known port numbers. However, based on our experience, incre ..."
Abstract
-
Cited by 82 (18 self)
- Add to MetaCart
(Show Context)
Many network intrusion detection systems (NIDS) rely on protocol-specific analyzers to extract the higher-level semantic context from a traffic stream. To select the cor-rect kind of analysis, traditional systems exclusively de-pend on well-known port numbers. However, based on our experience, increasingly significant portions of to-day’s traffic are not classifiable by such a scheme. Yet for a NIDS, this traffic is very interesting, as a primary rea-son for not using a standard port is to evade security and policy enforcement monitoring. In this paper, we dis-cuss the design and implementation of a NIDS extension to perform dynamic application-layer protocol analysis. For each connection, the system first identifies potential protocols in use and then activates appropriate analyz-ers to verify the decision and extract higher-level seman-tics. We demonstrate the power of our enhancement with three examples: reliable detection of applications not us-ing their standard ports, payload inspection of FTP data transfers, and detection of IRC-based botnet clients and servers. Prototypes of our system currently run at the border of three large-scale operational networks. Due to its success, the bot-detection is already integrated into a dynamic inline blocking of production traffic at one of the sites. 1
Automatic Network Protocol Analysis
- Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS’08
, 2008
"... ..."
(Show Context)
Network Monitoring using Traffic Dispersion Graphs (TDGs)
, 2007
"... Monitoring network traffic and detecting unwanted applications has become a challenging problem, since many applications obfuscate their traffic using unregistered port numbers or payload encryption. Apart from some notable exceptions, most traffic monitoring tools use two types of approaches: (a) k ..."
Abstract
-
Cited by 52 (9 self)
- Add to MetaCart
Monitoring network traffic and detecting unwanted applications has become a challenging problem, since many applications obfuscate their traffic using unregistered port numbers or payload encryption. Apart from some notable exceptions, most traffic monitoring tools use two types of approaches: (a) keeping traffic statistics such as packet sizes and interarrivals, flow counts, byte volumes, etc., or (b) analyzing packet content. In this paper, we propose the use of Traffic Dispersion Graphs (TDGs) as a way to monitor, analyze, and visualize network traffic. TDGs model the social behavior of hosts (“who talks to whom”), where the edges can be defined to represent different interactions (e.g. the exchange of a certain number or type of packets). With the introduction of TDGs, we are able to harness a wealth of tools and graph modeling techniques from a diverse set of disciplines.
Offline/Online Traffic Classification Using Semi-Supervised Learning
- Perform. Eval
, 2007
"... Identifying and categorizing network traffic by application type is challenging because of the continued evolution of applications, especially of those with a desire to be undetectable. The diminished effectiveness of port-based identification and the overheads of deep packet inspection approaches m ..."
Abstract
-
Cited by 48 (7 self)
- Add to MetaCart
(Show Context)
Identifying and categorizing network traffic by application type is challenging because of the continued evolution of applications, especially of those with a desire to be undetectable. The diminished effectiveness of port-based identification and the overheads of deep packet inspection approaches motivate us to classify traffic by exploiting distinctive flow characteristics of applications when they communicate on a network. In this paper, we explore this latter approach and propose a semi-supervised classification method that can accommodate both known and unknown applications. To the best of our knowledge, this is the first work to use semi-supervised learning techniques for the traffic classification problem. Our approach allows classifiers to be designed from training data that consists of only a few labeled and many unlabeled flows. We consider pragmatic classification issues such as longevity of classifiers and the need for retraining of classifiers. Our performance evaluation using empirical Internet traffic traces that span a 6-month period shows that: 1) high flow and byte classification accuracy (i.e., greater than 90%) can be achieved using training data that consists of a small number of labeled and a large number of unlabeled flows; 2) presence of “mice ” and “elephant ” flows in the Internet complicates the design of classifiers, especially of those with high byte accuracy, and necessities use of weighted sampling techniques to obtain training flows; and 3) retraining of classifiers is necessary only when there are non-transient changes in the network usage characteristics. As a proof of concept, we implement prototype offline and realtime classification systems to demonstrate the feasibility of our approach.
Unconstrained Endpoint Profiling: Googling the Internet
- In ACM SIGCOMM
, 2008
"... Understanding Internet access trends at a global scale, i.e., what do people do on the Internet, is a challenging problem that is typically addressed by analyzing network traces. However, obtaining such traces presents its own set of challenges owing to either privacy concerns or to other operationa ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
(Show Context)
Understanding Internet access trends at a global scale, i.e., what do people do on the Internet, is a challenging problem that is typically addressed by analyzing network traces. However, obtaining such traces presents its own set of challenges owing to either privacy concerns or to other operational difficulties. The key hypothesis of our work here is that most of the information needed to profile the Internet endpoints is already available around us — on the web. In this paper, we introduce a novel approach for profiling and classifying endpoints. We implement and deploy a Google-based profiling tool, which accurately characterizes endpoint behavior by collecting and strategically combining information freely available on the web. Our ‘unconstrained endpoint profiling ’ approach shows remarkable advances in the following scenarios: (i) Even when no packet traces are available, it can accurately predict application and protocol usage trends at arbitrary networks; (ii) When network traces are available, it dramatically outperforms state-ofthe-art classification tools; (iii) When sampled flow-level traces are available, it retains high classification capabilities when other schemes literally fall apart. Using this approach, we perform unconstrained endpoint profiling at a global scale: for clients in four different world regions (Asia, South and North America and Europe). We provide the first-of-its-kind endpoint analysis which reveals fascinating similarities and differences among these regions.
Graph-based P2P Traffic Classification at the Internet Backbone
"... Monitoring network traffic and classifying applications are essential functions for network administrators. In this paper, we consider the use of Traffic Dispersion Graphs (TDGs) to classify network traffic. Given a set of flows, a TDG is a graph with an edge between any two IP addresses that commun ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
(Show Context)
Monitoring network traffic and classifying applications are essential functions for network administrators. In this paper, we consider the use of Traffic Dispersion Graphs (TDGs) to classify network traffic. Given a set of flows, a TDG is a graph with an edge between any two IP addresses that communicate; thus TDGs capture network-wide interactions. Using TDGs, we develop an application classification framework dubbed Graption (Graph-based classification). Our framework provides a systematic way to harness the power of network-wide behavior, flow-level characteristics, and data mining techniques. As a proof of concept, we instantiate our framework to detect P2P applications, and show that it can identify P2P traffic with recall and precision greater than 90 % in backbone traces, which are particularly challenging for other methods.
KISS: Stochastic Packet Inspection Classifier for UDP Traffic
"... This paper proposes KISS, a novel Internet classification engine. Motivated by the expected raise of UDP traffic, which stems from the momentum of P2P streaming applications, we propose a novel classification framework which leverages on statistical characterization of payload. Statistical signatur ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
This paper proposes KISS, a novel Internet classification engine. Motivated by the expected raise of UDP traffic, which stems from the momentum of P2P streaming applications, we propose a novel classification framework which leverages on statistical characterization of payload. Statistical signatures are derived by the means of a Chi-Square like test, which extracts the protocol “format”, but ignores the protocol “semantic ” and “synchronization ” rules. The signatures feed a decision process based either on the geometric distance among samples, or on Support Vector Machines. KISS is very accurate, and its signatures are intrinsically robust to packet sampling, reordering, and flow asymmetry, so that it can be used on almost any network. KISS is tested in different scenarios, considering traditional client-server protocols, VoIP and both traditional and new P2P Internet applications. Results are astonishing. The average True Positive percentage is 99.6%, with the worst case equal 98.1,% while results are almost perfect when dealing with new P2P streaming applications.