DMCA
Opcode sequences as representation of executables for data-mining-based unknown malware detection (2013)
Venue: | INFORMATION SCIENCES 227 |
Citations: | 12 - 0 self |
Citations
13215 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...ftware). Maximal margin is defined by the largest distance between the examples of the two classes computed from the distance between the closest instances of both classes (called supporting vectors) =-=[80]-=-. Figure 3: Example of a SVM classifier in a bi-dimensional space. Formally, the optimal hyperplane is represented by a vector w and a scalar m in a way that the inner products of w with vectors ϕ(Xi)... |
6600 |
C4.5: Programs For Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...andom Forest, which is an ensemble (i.e., combination of weak classifiers) of different randomly-built decision trees [7]. Further, we also use J48, the Weka [25] implementation of the C4.5 algorithm =-=[61]-=-. 10 4.2. Support Vector Machines (SVM) SVM classifiers consist of a hyperplane dividing a n-dimensional-spacebased representation of the data into two regions (shown in Figure 3). This hyperplane is ... |
5443 |
Artificial Intelligence: A Modern Approach
- Russell, Norvig
- 2003
(Show Context)
Citation Context ...on). { K-Nearest Neighbour (KNN): We performed experiments over the range k = 1 to k = 10 to train KNN. { Bayesian networks (BN): We used several structural learning algorithms; K2 [18], Hill Climber =-=[64]-=- and Tree Augmented Nave (TAN) [26]. We also performed experiments with a Nave Bayes classifier [41]. { Support Vector Machines (SVM): We used a Sequential Minimal Optimization (SMO) algorithm [57... |
4017 |
Introduction to modern information retrieval
- Salton, McGill
- 1983
(Show Context)
Citation Context ...ues. Besides, long opcode sequences will introduce a high performance overhead. Afterwards, we compute the frequency of occurrence of each opcode sequence within the file by using Term Frequency (TF) =-=[48]-=- (shown in equation 2) that is a weight widely used in information retrieval [83]: tfi,j = ni,j∑ k nk,j (2) where ni,j is the number of times the sequence si,j (in our case opcode sequence) appears in... |
3408 |
Principal Component Analysis
- Jolliffe
- 1986
(Show Context)
Citation Context ...ly information about benign software to measure the deviations from the benign behaviour profile. Specifically, they applied a Gaussian Likelihood Model fitted with Principal Component Analysis (PCA) =-=[31]-=-. Unfortunately, these methods usually have a high false positive ratio that renders them difficult for commercial antivirus vendors to adopt. Data-mining-based approaches rely on datasets that includ... |
1374 |
A vector space model for automatic indexing
- Salton, Wong, et al.
- 1975
(Show Context)
Citation Context ... such as push, mov or add, tended to be weighted low in the results. These weights may be considered a replacement for the Inverse Document Frequency (IDF) measure [62] used in the Vector Space Model =-=[65]-=- for information retrieval. The IDF weighting terms occur in documents based on the frequency with which they appear in the whole document. Our method performs a similar task by using mutual informati... |
1282 | A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model
- Kohavi
- 1995
(Show Context)
Citation Context ...efined with the parameter k. The results of each round are averaged to estimate the global measures of the tested model. Thereby, for each classifier we tested, we performed a k-fold cross validation =-=[35]-=- with k = 10. In this way, our dataset was split 10 times into 10 different sets of learning (90% of the total dataset) and testing (10% of the total data). Learning the model: For each validation s... |
1062 |
Pattern recognition and machine learning
- Bishop
- 2009
(Show Context)
Citation Context ...9, 81, 70]. Machine learning is a discipline within Articial Intelligence (AI) concerned with the design and development of algorithms that allow computers to reason and make decisions based on data =-=[5]-=-. Generally, machine-learning algorithms can be classified into three different types: supervised learning, unsupervised learning and semi-supervised learning algorithms. First, supervised machine-lea... |
961 | A study of smoothing methods for language models applied to information retrieval.
- Zhai, Lafferty
- 2004
(Show Context)
Citation Context ...fterwards, we compute the frequency of occurrence of each opcode sequence within the file by using Term Frequency (TF) [48] (shown in equation 2) that is a weight widely used in information retrieval =-=[83]-=-: tfi,j = ni,j∑ k nk,j (2) where ni,j is the number of times the sequence si,j (in our case opcode sequence) appears in an executable e, and ∑ k nk,j is the total number of terms in the executable e (... |
796 | Bayesian network classifiers.
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...performed experiments over the range k = 1 to k = 10 to train KNN. { Bayesian networks (BN): We used several structural learning algorithms; K2 [18], Hill Climber [64] and Tree Augmented Nave (TAN) =-=[26]-=-. We also performed experiments with a Nave Bayes classifier [41]. { Support Vector Machines (SVM): We used a Sequential Minimal Optimization (SMO) algorithm [57] and performed experiments with a po... |
688 |
An essay toward solving a problem in the doctrine of chances
- Bayes
- 1958
(Show Context)
Citation Context ...ng the class of the unknown sample; the most commonly used technique is to classify the unknown instance as the most common class among its K-Nearest Neighbours. 4.4. Bayesian Networks Bayes' Theorem =-=[2]-=- is the basis of the so-called Bayesian inference, a statistical reasoning method that determines, based on a number of observations, the probability that a hypothesis may be true. Bayes’ theorem adju... |
571 | Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy
- Peng, Long, et al.
- 2005
(Show Context)
Citation Context ... dataset and the malicious software dataset. 3. Computation of opcode relevance: We computed the relevance of each opcode based on the frequency it appears in each dataset. We used Mutual Information =-=[53]-=- (shown in equation 1) to measure the statistical dependence of the two variables: I(X;Y ) = ∑ yϵY ∑ xϵX p(x, y) log ( p(x, y) p(x) p(y) ) (1) where X is the opcode frequency and Y is the class of t... |
499 | Naive (bayes) at forty: the independence assumption in information retrieval,”
- Lewis
- 1998
(Show Context)
Citation Context ...{ Bayesian networks (BN): We used several structural learning algorithms; K2 [18], Hill Climber [64] and Tree Augmented Nave (TAN) [26]. We also performed experiments with a Nave Bayes classifier =-=[41]-=-. { Support Vector Machines (SVM): We used a Sequential Minimal Optimization (SMO) algorithm [57] and performed experiments with a polynomial kernel [1], a normalised polynomial kernel [1], Pearson VI... |
468 |
Semi-Supervised Learning.
- Chapelle, Scholkopf, et al.
- 2006
(Show Context)
Citation Context ...be labelled [38]. Finally, semi-supervised machine-learning algorithms use a mixture of both labelled and unlabelled data in order to build models, thus improving the accuracy of unsupervised methods =-=[12]-=-. Since in our case malware can be properly labelled, we use supervised machine-learning. In future work, however, we would also like to test the ability of unsupervised methods to detect malware. In ... |
461 | Sequential minimal optimization: A fast algorithm for training support vector machines.
- Platt
- 1998
(Show Context)
Citation Context ...64] and Tree Augmented Nave (TAN) [26]. We also performed experiments with a Nave Bayes classifier [41]. { Support Vector Machines (SVM): We used a Sequential Minimal Optimization (SMO) algorithm =-=[57]-=- and performed experiments with a polynomial kernel [1], a normalised polynomial kernel [1], Pearson VII function-based universal kernel [79], and a Radial Basis Runction (RBF) based kernel [1]. Tes... |
440 | The boosting approach to machine learning an overview,”
- Schapire
- 2002
(Show Context)
Citation Context ...ct and classify malware. We provide empirical validation of our method with an extensive study 3Boosting is a machine learning technique that builds a strong classifier composed by weak classifiers =-=[68]-=-. 4 of data-mining models for detecting and classifying unknown malicious software. We show that the proposed methods achieve high detection rates, even for completely new and previously unseen thre... |
423 |
Discriminatory analysis: nonparametric discrimination. Project 21-49-004, report no
- Fix, Hodges
- 1951
(Show Context)
Citation Context ...el functions are applied. These kernel functions lead to nonlinear classification surfaces, such as polynomial, radial or sigmoid surfaces [1]. 4.3. K-Nearest Neighbours The K-Nearest Neighbour (KNN) =-=[24]-=- algorithm is one of the simplest supervised machine-learning algorithms for classifying instances. This method classifies an unknown instance based on the class (in our case, malware or benign softwa... |
402 | The foundations of cost-sensitive learning, in:
- Elkan
- 2001
(Show Context)
Citation Context ...listing). On the other hand, cost-sensitive learning is a machinelearning technique where one can specify the cost of each error and the classifiers are trained taking into account that consideration =-=[22]-=-. Furthermore, we measured accuracy, i.e., the total number of the classifier’s hits divided by the number of instances in the whole dataset (shown in equation 11): Accuracy(%) = TP + TN TP + FP + TP ... |
314 |
Data Preparation for Data Mining,
- Pyle
- 1999
(Show Context)
Citation Context ...orage and time costs, it is necessary to reduce the original training set [19]. In order to solve this issue, data reduction is normally considered an appropriate preprocessing optimisation technique =-=[59, 78]-=-. Such techniques have many potential advantages such as reducing measurement, storage and Figure 14: Comparison of the results in terms of FPR for the combination of features of opcode-sequence lengt... |
243 |
The Art of Computer Virus Research and Defense"
- Szor
- 2005
(Show Context)
Citation Context ...ad, allowing further static or dynamic analysis of the executable. Another solution is to use concrete unpacking routines to recover the actual payload that requires one routine per packing algorithm =-=[75]-=-. Obviously, this approach is limited to a fixed 28 Figure 12: Comparison of the results in terms of FPR of the classifiers for an opcodesequence length of 2. set of known packers. Likewise, commercia... |
225 |
Expert Systems and Probabilistic Network Models,
- Castillo, Gutierrez, et al.
- 1997
(Show Context)
Citation Context ...) = P (BjA) P (A) P (B) (7) Bayesian networks [52] are probabilistic models for multivariate analysis. Formally, they are directed acyclic graphs associated with a probability distribution function =-=[11]-=-. Nodes in the graph represent variables (which can be either a premise or a conclusion) while the arcs represent conditional dependencies between such variables. The probability function illustrates ... |
201 | Semantics-aware malware detection.
- Christodorescu, Jha, et al.
- 2005
(Show Context)
Citation Context ...st of these methods are limited as they count certain bytes in the malware body; because most of the common transformations operate at the source level, these detection methods can be easily thwarted =-=[14]-=-. These approaches were also used by Perdisci et al. [54] to detect packed executables. Perdisci et al. [54] proposed their first approach based on (i) the extraction of some features from the Portabl... |
188 | Supervised machine learning: a review of classification techniques.
- Kotsiantis
- 2007
(Show Context)
Citation Context ...arning algorithms. First, supervised machine-learning algorithms, or classifying algorithms, require the training dataset to be properly labelled (in our case, knowing whether an instance is malware) =-=[37]-=-. Second, unsupervised machine-learning algorithms, or clustering algorithms, try to assess how data are organised into different groups called clusters. In this type of machine-learning, data do not ... |
170 | Reverend bayes on inference engines: A distributed hierarchical approach
- Pearl
- 1982
(Show Context)
Citation Context ...ned if we know the probability that A occurs, P (A), the probability that B occurs, P (B), and the conditional probability of B given A, P (BjA). P (AjB) = P (BjA) P (A) P (B) (7) Bayesian networks =-=[52]-=- are probabilistic models for multivariate analysis. Formally, they are directed acyclic graphs associated with a probability distribution function [11]. Nodes in the graph represent variables (which ... |
169 | Understanding inverse document frequency: On theoretical arguments for idf
- Robertson
(Show Context)
Citation Context ...served that the most common opcodes, such as push, mov or add, tended to be weighted low in the results. These weights may be considered a replacement for the Inverse Document Frequency (IDF) measure =-=[62]-=- used in the Vector Space Model [65] for information retrieval. The IDF weighting terms occur in documents based on the frequency with which they appear in the whole document. Our method performs a si... |
162 |
Feature extraction by non parametric mutual information maximization
- Torkkola
- 2003
(Show Context)
Citation Context ... training and testing times; confronting the curse of dimensionality to improve prediction performance in terms of speed, accuracy and simplicity and facilitating data visualization and understanding =-=[76, 20]-=-. Data reduction can be implemented in two ways. On the one hand, Instance Selection (IS) seeks to reduce the evidences (i.e., number of rows) in the training set by selecting the most relevant instan... |
155 | Data Mining Methods for Detection of New Malicious Executables,
- Schultz, Eskin, et al.
- 2001
(Show Context)
Citation Context ...veral characteristic features for of both malicious samples and benign software to build classification tools that detect malware in the wild (i.e., undocumented malware). To this end, Schultz et al. =-=[69]-=- were the first to introduce the idea of applying data-mining models to the detection of different malicious programs based on their respective binary codes. Specifically, they applied several classif... |
149 | Static analysis of executables to detect malicious patterns.
- Christodorescu, Jha
- 2003
(Show Context)
Citation Context ...presentation of the execution of binary executables and improved the slicing of a program into idioms (i.e., sequences of instructions). It is also worth to mention the work of Christodorescu and Jha =-=[16]-=-, who proposed a method based on CFG analysis to handle obfuscations in malicious software. Later, Christodorescu et al. [17] improved on this work by including semantic-templates of malicious specifi... |
116 |
Selection of Relevant Features and Examples
- Blum, Langley
- 1997
(Show Context)
Citation Context ...uences. Because both IS and FS are very effective at reducing the size of the training set and helping to filtrate and clean noisy data, thereby improving the accuracy of machine-learning classifiers =-=[6, 21]-=-, we strongly encourage the use of these methods. 7. Conclusions Malware detection has become a major topic of research and concern owing to the increasing growth of malicious code in recent years. Th... |
89 | Improving support vector machine classifiers bymodifying kernel functions,”Neural Networks,
- Amari, Wu
- 1999
(Show Context)
Citation Context ...enerally, instead of using inner products, the so-called kernel functions are applied. These kernel functions lead to nonlinear classification surfaces, such as polynomial, radial or sigmoid surfaces =-=[1]-=-. 4.3. K-Nearest Neighbours The K-Nearest Neighbour (KNN) [24] algorithm is one of the simplest supervised machine-learning algorithms for classifying instances. This method classifies an unknown inst... |
88 |
A Bayesian Method for Constructing Bayesian Belief Networks from Databases,"
- Cooper, Herskovits
- 1991
(Show Context)
Citation Context ...5 [61] implementation). { K-Nearest Neighbour (KNN): We performed experiments over the range k = 1 to k = 10 to train KNN. { Bayesian networks (BN): We used several structural learning algorithms; K2 =-=[18]-=-, Hill Climber [64] and Tree Augmented Nave (TAN) [26]. We also performed experiments with a Nave Bayes classifier [41]. { Support Vector Machines (SVM): We used a Sequential Minimal Optimization ... |
87 |
Consistency-based search in feature selection
- Dash, Liu
- 2003
(Show Context)
Citation Context ... training and testing times; confronting the curse of dimensionality to improve prediction performance in terms of speed, accuracy and simplicity and facilitating data visualization and understanding =-=[76, 20]-=-. Data reduction can be implemented in two ways. On the one hand, Instance Selection (IS) seeks to reduce the evidences (i.e., number of rows) in the training set by selecting the most relevant instan... |
86 |
PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware,"
- Royal, Halpin, et al.
- 2006
(Show Context)
Citation Context ...seems a more promising solution to this problem [32]. One solution to solve this obvious limitation of our malware detection method is the use of a generic dynamic unpacking schema such as PolyUnpack =-=[63]-=-, Renovo [32], OmniUnpack [47] and Eureka [72]. These methods execute the sample in a contained environment and extract the actual payload, allowing further static or dynamic analysis of the executabl... |
84 | WEKA: The Waikato Environment for Knowledge Analysis,” in
- Garner
- 1995
(Show Context)
Citation Context ...a labelled dataset. In this work, we use Random Forest, which is an ensemble (i.e., combination of weak classifiers) of different randomly-built decision trees [7]. Further, we also use J48, the Weka =-=[25]-=- implementation of the C4.5 algorithm [61]. 10 4.2. Support Vector Machines (SVM) SVM classifiers consist of a hyperplane dividing a n-dimensional-spacebased representation of the data into two region... |
82 |
Random forests, Machine Learning 45
- Breiman
- 2001
(Show Context)
Citation Context ...ph structure of these trees by means of a labelled dataset. In this work, we use Random Forest, which is an ensemble (i.e., combination of weak classifiers) of different randomly-built decision trees =-=[7]-=-. Further, we also use J48, the Weka [25] implementation of the C4.5 algorithm [61]. 10 4.2. Support Vector Machines (SVM) SVM classifiers consist of a hyperplane dividing a n-dimensional-spacebased r... |
82 | Testing malware detectors,
- Christodorescu, Jha
- 2004
(Show Context)
Citation Context ... the behaviour of the program (e.g., nop instructions1); code reordering, which changes the order of program instructions and variable renaming ; which replaces a variable identifier with another one =-=[15]-=-. Several approaches have been proposed by the research community to counter these obfuscation techniques. For instance, Sung, Xu and Chavez [74, 82] introduced a method for computing the similarity b... |
75 | Learning to detect malicious executables in the wild
- Kolter, Maloof
- 2004
(Show Context)
Citation Context ...ective binary codes. Specifically, they applied several classifiers to three different feature extraction approaches: program headers, string features and byte sequence features. Later, Kolter et al. =-=[36]-=- improved the results obtained by Schulz et al. by using n-grams (i.e., over3 lapping byte sequences) instead of non-overlapping sequences. This method used several algorithms and the best results wer... |
71 | Renovo: a hidden code extractor for packed executables.
- Kang, Poosankam, et al.
- 2007
(Show Context)
Citation Context ...d into memory. Indeed, static detection methods can deal with packed malware only by using the signatures of the packers. Accordingly, dynamic analysis seems a more promising solution to this problem =-=[32]-=-. One solution to solve this obvious limitation of our malware detection method is the use of a generic dynamic unpacking schema such as PolyUnpack [63], Renovo [32], OmniUnpack [47] and Eureka [72]. ... |
67 |
Computational methods of feature selection”.
- Liu, Motoda
- 2008
(Show Context)
Citation Context ...set by selecting the most relevant instances or re-sampling new ones [43]. On the other hand, Feature Selection (FS) decreases the number of attributes or features (i.e., columns) in the training set =-=[44]-=-. We applied FS in our experiments when selecting the 1000 top-ranked opcode sequences. Because both IS and FS are very effective at reducing the size of the training set and helping to filtrate and c... |
64 |
Fileprints: Identifying File Types by n-gram Analysis”,
- Li, Wang, et al.
- 2005
(Show Context)
Citation Context ...77, 27] Anomaly detectors use information retrieved from benign software to obtain a benign behaviour prole. Then, every significant deviation from this profile is qualified as suspicious. Li et al. =-=[42]-=- proposed thesleprint (or n-gram) analysis in which a model or set of models attempt to characterise several file types on a system based on their structural (byte) composition. The main assumption be... |
57 | OmniUnpack: Fast,Generic, and Safe Unpacking of Malware,"
- Martignoni, Christodorescu, et al.
- 2007
(Show Context)
Citation Context ...n to this problem [32]. One solution to solve this obvious limitation of our malware detection method is the use of a generic dynamic unpacking schema such as PolyUnpack [63], Renovo [32], OmniUnpack =-=[47]-=- and Eureka [72]. These methods execute the sample in a contained environment and extract the actual payload, allowing further static or dynamic analysis of the executable. Another solution is to use ... |
43 | Malware phylogeny generation using permutations of code.
- Karim, Walenstein, et al.
- 2005
(Show Context)
Citation Context ...able: it cannot cope with code obfuscations and cannot detect previously unseen malware. Malware writers use code obfuscation techniques [40] to hide the actual behaviour of their malicious creations =-=[8, 84, 13, 33]-=-. Examples of these obfuscation algorithms include garbage insertion, which consists on adding sequences which do not modify the behaviour of the program (e.g., nop instructions1); code reordering, wh... |
40 | Recent advances in clustering: A brief survey.
- Kotsiantis, PE
- 2004
(Show Context)
Citation Context ...ed machine-learning algorithms, or clustering algorithms, try to assess how data are organised into different groups called clusters. In this type of machine-learning, data do not need to be labelled =-=[38]-=-. Finally, semi-supervised machine-learning algorithms use a mixture of both labelled and unlabelled data in order to build models, thus improving the accuracy of unsupervised methods [12]. Since in o... |
36 | Detecting Self-Mutating Malware Using Control-Flow Graph Matching
- Bruschi, Martignoni, et al.
- 2006
(Show Context)
Citation Context ...able: it cannot cope with code obfuscations and cannot detect previously unseen malware. Malware writers use code obfuscation techniques [40] to hide the actual behaviour of their malicious creations =-=[8, 84, 13, 33]-=-. Examples of these obfuscation algorithms include garbage insertion, which consists on adding sequences which do not modify the behaviour of the program (e.g., nop instructions1); code reordering, wh... |
35 |
Static analysis of binary code to isolate malicious behaviors
- Bergeron, Debbabi, et al.
- 1999
(Show Context)
Citation Context ... tell-tale signs (e.g., operations that change the state of a program such as network access events and file operations) in order to determine whether an executable may be malicious. Bergerson et al. =-=[3]-=- presented several methods for disassembling of binary executables, helping to build a representation of the execution of binary executables and improved the slicing of a program into idioms (i.e., se... |
33 |
Mcf A malicious code filter.
- Lo, Levitt, et al.
- 1995
(Show Context)
Citation Context ...any action. 2A syscall or system call is the procedure through which an executable requests a service from the kernel of the operating system. 2 (CFG) analysis. An example was introduced by Lo et al. =-=[45]-=- as part of the Malicious Code Filter (MCF) project. Their method slices a program into blocks while looking for tell-tale signs (e.g., operations that change the state of a program such as network ac... |
29 |
A Survey of Malware Detection Techniques",
- Idike, Mathur
- 2007
(Show Context)
Citation Context ...ection due to the increasing growth of malware. Data mining approaches usually rely onmachine-learning algorithms that use both malicious executables and benign software to detect malware in the wild =-=[28, 29, 81, 70]-=-. Machine learning is a discipline within Articial Intelligence (AI) concerned with the design and development of algorithms that allow computers to reason and make decisions based on data [5]. Gener... |
26 |
Instance selection and construction for data mining (pp. 3–20). http://books.google.co.in/books/about/Instance_selection_and_construction_for.ht ml?id=4qjTyvCpnGgC.
- Liu, Motoda
- 2001
(Show Context)
Citation Context ...emented in two ways. On the one hand, Instance Selection (IS) seeks to reduce the evidences (i.e., number of rows) in the training set by selecting the most relevant instances or re-sampling new ones =-=[43]-=-. On the other hand, Feature Selection (FS) decreases the number of attributes or features (i.e., columns) in the training set [44]. We applied FS in our experiments when selecting the 1000 top-ranked... |
26 |
Intrusion detection by machine learning: a review,” Expert Systems with Applications,
- Tsai, Hsu, et al.
- 2009
(Show Context)
Citation Context ...al with unknown malware that classic signature method cannot handle: anomaly detectors and data-miningbased detectors. These approaches have been also used in similar domains like intrusion detection =-=[39, 23, 77, 27]-=- Anomaly detectors use information retrieved from benign software to obtain a benign behaviour prole. Then, every significant deviation from this profile is qualified as suspicious. Li et al. [42] pr... |
24 | Opcodes as predictor for malware.
- Bilar
- 2007
(Show Context)
Citation Context ...ational codes in machine language). Our method is based on the frequency of appearance of opcode-sequences: it trains several data-mining algorithms in order to detect unknown malware. A recent study =-=[4]-=- statistically analysed the ability of single opcodes to serve as the basis for malware detection and confirmed their high reliability for determining the maliciousness of executables. In a previous w... |
24 | Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executables. In:
- Perdisci, Lanzi, et al.
- 2008
(Show Context)
Citation Context ...tandard sections, the number of executable sections, the entropy of the PE header; and (ii) the classification through machine-learning models, e.g., Näıve Bayes, J48 and MLP. Later, Perdisci et al. =-=[55]-=- evolved their method and developed a fast statistical malware detection tool. They added new ngram-based classifiers to combine their results with the previous MLP-based classifier. Given the state o... |
24 | Static analyzer of vicious executables (save),"
- Sung, Xu, et al.
- 2004
(Show Context)
Citation Context ... which replaces a variable identifier with another one [15]. Several approaches have been proposed by the research community to counter these obfuscation techniques. For instance, Sung, Xu and Chavez =-=[74, 82]-=- introduced a method for computing the similarity between two executables by focusing on the degree of similarity within syscall2 sequences. This approach offered only a limited performance because of... |
22 |
Classification of Packed Executables for Accurate Computer Virus Detection
- Perdisci, Lanzi, et al.
- 2008
(Show Context)
Citation Context ...es in the malware body; because most of the common transformations operate at the source level, these detection methods can be easily thwarted [14]. These approaches were also used by Perdisci et al. =-=[54]-=- to detect packed executables. Perdisci et al. [54] proposed their first approach based on (i) the extraction of some features from the Portable Executable (PE), e.g., the number of standard and non-s... |
22 |
L.M.C.: Facilitating the application of support vector regression by using a universal pearson vii function based kernel.
- Ustun, Melssen, et al.
- 2006
(Show Context)
Citation Context ... We used a Sequential Minimal Optimization (SMO) algorithm [57] and performed experiments with a polynomial kernel [1], a normalised polynomial kernel [1], Pearson VII function-based universal kernel =-=[79]-=-, and a Radial Basis Runction (RBF) based kernel [1]. Testing the model: In order to measure the processing overhead of the proposed model, we measure the required representation time, number of fea... |
20 | Embedded malware detection using markov n-grams
- Shafiq, Khayam, et al.
- 2008
(Show Context)
Citation Context ...sed several algorithms and the best results were achieved with a Boosted3 Decision Tree. In a similar vein, substantial research has focussed on n-gram distributions of byte sequences and data mining =-=[50, 71, 85, 67]-=-. Still, most of these methods are limited as they count certain bytes in the malware body; because most of the common transformations operate at the source level, these detection methods can be easil... |
19 | On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining
- Cano, Herrera, et al.
- 2006
(Show Context)
Citation Context ... databases. As the dataset size grows, so does the issue of scalability. This problem produces excessive storage requirements, increases time complexity and impairs the general accuracy of the models =-=[10]-=-. To reduce disproportionate storage and time costs, it is necessary to reduce the original training set [19]. In order to solve this issue, data reduction is normally considered an appropriate prepro... |
19 |
Induction of Decision Trees, Machine Learning 1
- Quinlan
- 1986
(Show Context)
Citation Context ...an snippet of the 9 actual decision tree generated). The internal nodes represent conditions of the problem variables, and their final nodes (or leaves) constitute the final decision of the algorithm =-=[60]-=-. In our case, the final nodes would represent whether an executable is malware or not. Figure 2: Extract of the Decision Tree. Formally, a decision tree graph G = (V,E) consists on a not empty set of... |
19 | MetaAware: Identifying Metamorphic Malware,”
- Zhang, Reeves
- 2007
(Show Context)
Citation Context ...able: it cannot cope with code obfuscations and cannot detect previously unseen malware. Malware writers use code obfuscation techniques [40] to hide the actual behaviour of their malicious creations =-=[8, 84, 13, 33]-=-. Examples of these obfuscation algorithms include garbage insertion, which consists on adding sequences which do not modify the behaviour of the program (e.g., nop instructions1); code reordering, wh... |
17 |
A distance-based attribute selection measure for decision tree induction
- Mantaras
- 1991
(Show Context)
Citation Context ...sorted and generated an opcode relevance file. The opcode frequency file was saved so that we may calculate the relevance of the opcodes in future research using other measures such as the gain ratio =-=[46]-=- or chi-square [30]. 6http://www.eset.com/ 7http://www.frontiernet.net/ fys/newbasic.htm 6 This list of opcode relevances helped with more accurate malware detection because we were able to weight the... |
14 |
Polymorphic malicious executable scanner by API sequence analysis
- Xu, Sung, et al.
- 2004
(Show Context)
Citation Context ... which replaces a variable identifier with another one [15]. Several approaches have been proposed by the research community to counter these obfuscation techniques. For instance, Sung, Xu and Chavez =-=[74, 82]-=- introduced a method for computing the similarity between two executables by focusing on the degree of similarity within syscall2 sequences. This approach offered only a limited performance because of... |
13 |
Machine learning techniques and chisquare feature selection for cancer classification using SAGE gene expression profiles
- Jin, Xu, et al.
(Show Context)
Citation Context ...d an opcode relevance file. The opcode frequency file was saved so that we may calculate the relevance of the opcodes in future research using other measures such as the gain ratio [46] or chi-square =-=[30]-=-. 6http://www.eset.com/ 7http://www.frontiernet.net/ fys/newbasic.htm 6 This list of opcode relevances helped with more accurate malware detection because we were able to weight the final representati... |
13 |
A framework for enabling static malware analysis.
- Sharif, Yegneswaran, et al.
- 2008
(Show Context)
Citation Context ...m [32]. One solution to solve this obvious limitation of our malware detection method is the use of a generic dynamic unpacking schema such as PolyUnpack [63], Renovo [32], OmniUnpack [47] and Eureka =-=[72]-=-. These methods execute the sample in a contained environment and extract the actual payload, allowing further static or dynamic analysis of the executable. Another solution is to use concrete unpacki... |
11 |
Unknown Malcode Detection via Text Categorization and the Imbalance Problem”,
- Moskovitch, Stopel, et al.
- 2008
(Show Context)
Citation Context ...sed several algorithms and the best results were achieved with a Boosted3 Decision Tree. In a similar vein, substantial research has focussed on n-gram distributions of byte sequences and data mining =-=[50, 71, 85, 67]-=-. Still, most of these methods are limited as they count certain bytes in the malware body; because most of the common transformations operate at the source level, these detection methods can be easil... |
11 |
Idea: Opcode-sequence-based malware detection.” Engineering Secure Software and Systems
- Santos
- 2010
(Show Context)
Citation Context ...tistically analysed the ability of single opcodes to serve as the basis for malware detection and confirmed their high reliability for determining the maliciousness of executables. In a previous work =-=[66]-=-, we presented an approach focused on detecting obfuscated malware variants using the frequency of appearance of opcode-sequences to build an information retrieval representation of executables. We no... |
10 | P.Bringas, “N-Grams-based file signatures for malware detection
- Santos, Devesa
- 2009
(Show Context)
Citation Context ...sed several algorithms and the best results were achieved with a Boosted3 Decision Tree. In a similar vein, substantial research has focussed on n-gram distributions of byte sequences and data mining =-=[50, 71, 85, 67]-=-. Still, most of these methods are limited as they count certain bytes in the malware body; because most of the common transformations operate at the source level, these detection methods can be easil... |
10 |
OFFSS: Optimal fuzzyvalued feature subset selection
- Tsang, Yeung, et al.
- 2003
(Show Context)
Citation Context ...orage and time costs, it is necessary to reduce the original training set [19]. In order to solve this issue, data reduction is normally considered an appropriate preprocessing optimisation technique =-=[59, 78]-=-. Such techniques have many potential advantages such as reducing measurement, storage and Figure 14: Comparison of the results in terms of FPR for the combination of features of opcode-sequence lengt... |
9 | Using engine signature to detect metamorphic malware
- Chouchane, Lakhotia
- 2006
(Show Context)
Citation Context ...able: it cannot cope with code obfuscations and cannot detect previously unseen malware. Malware writers use code obfuscation techniques [40] to hide the actual behaviour of their malicious creations =-=[8, 84, 13, 33]-=-. Examples of these obfuscation algorithms include garbage insertion, which consists on adding sequences which do not modify the behaviour of the program (e.g., nop instructions1); code reordering, wh... |
9 |
Malware Detection Using Adaptive Data Compression”, AISec ’08
- Zhou, Inge
- 2008
(Show Context)
Citation Context ...sed several algorithms and the best results were achieved with a Boosted3 Decision Tree. In a similar vein, substantial research has focussed on n-gram distributions of byte sequences and data mining =-=[50, 71, 85, 67]-=-. Still, most of these methods are limited as they count certain bytes in the malware body; because most of the common transformations operate at the source level, these detection methods can be easil... |
8 |
Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey,” Information Security
- Shabtai, Moskovitch, et al.
- 2009
(Show Context)
Citation Context ...ection due to the increasing growth of malware. Data mining approaches usually rely onmachine-learning algorithms that use both malicious executables and benign software to detect malware in the wild =-=[28, 29, 81, 70]-=-. Machine learning is a discipline within Articial Intelligence (AI) concerned with the design and development of algorithms that allow computers to reason and make decisions based on data [5]. Gener... |
7 |
On the versatility of radial basis function neural networks: A case study in the field of intrusion detection
- Fisch, Hofmann, et al.
- 2010
(Show Context)
Citation Context ...al with unknown malware that classic signature method cannot handle: anomaly detectors and data-miningbased detectors. These approaches have been also used in similar domains like intrusion detection =-=[39, 23, 77, 27]-=- Anomaly detectors use information retrieved from benign software to obtain a benign behaviour prole. Then, every significant deviation from this profile is qualified as suspicious. Li et al. [42] pr... |
7 |
Processing virus collections
- Morley
- 2001
(Show Context)
Citation Context ...8]. In the past, malware creators were motivated mainly by fame or glory. Most current malware, however, is economically motivated [51]. Commercial anti-malware solutions rely on a signature database =-=[49]-=- (i.e., list of signatures) for detection. An example of a signature is a sequence of bytes that is always present within a malicious executable and within the files already infected by that malware. ... |
7 |
The evolution of commercial malware development kits and colour-by-numbers custom malware
- OLLMANN
- 2008
(Show Context)
Citation Context ...re that has been explicitly designed to harm computers or networks [58]. In the past, malware creators were motivated mainly by fame or glory. Most current malware, however, is economically motivated =-=[51]-=-. Commercial anti-malware solutions rely on a signature database [49] (i.e., list of signatures) for detection. An example of a signature is a sequence of bytes that is always present within a malicio... |
6 |
Multiple criteria mathematical programming for multi-class classification and application in network intrusion detection, Information Sciences 179
- Kou, Peng, et al.
- 2009
(Show Context)
Citation Context ...al with unknown malware that classic signature method cannot handle: anomaly detectors and data-miningbased detectors. These approaches have been also used in similar domains like intrusion detection =-=[39, 23, 77, 27]-=- Anomaly detectors use information retrieved from benign software to obtain a benign behaviour prole. Then, every significant deviation from this profile is qualified as suspicious. Li et al. [42] pr... |
6 |
Principles and practise of x-raying
- Perriot, Ferrie
- 2004
(Show Context)
Citation Context ...0:11 0:03 0:93 0:02 BN: K2 87:29 2:59 0:83 0:04 0:08 0:03 0:94 0:02 BN: Hill Climber 87:29 2:59 0:83 0:04 0:08 0:03 0:94 0:02 BN: TAN 93:40 1:80 0:91 0:03 0:04 0:02 0:98 0:01 encryption =-=[56]-=-. Still, these techniques cannot cope with the increasing use of packing techniques, and, we suggest the use of dynamic unpacking schemas to confront the problem. Fourth, it may seem that our method d... |
6 |
Comparative analysis of regression and machine learning methods for predicting fault proneness models
- Singh, Kaur, et al.
- 2009
(Show Context)
Citation Context ...n in equation 11): Accuracy(%) = TP + TN TP + FP + TP + TN 100 (11) Besides, we measured the Area Under the ROC Curve (AUC) that establishes the relation between false negatives and false positives =-=[73]-=-. The ROC (Receiver Operator Characteristics) curve is obtained by plotting the TPR against the FPR. 18 Figure 6: Time results for extracting the assembly code from the binary. As we can see the requi... |
4 | study on the use of coevolutionary algorithms for instance and feature selection
- Derrac, Garcia, et al.
- 2009
(Show Context)
Citation Context ...uences. Because both IS and FS are very effective at reducing the size of the training set and helping to filtrate and clean noisy data, thereby improving the accuracy of machine-learning classifiers =-=[6, 21]-=-, we strongly encourage the use of these methods. 7. Conclusions Malware detection has become a major topic of research and concern owing to the increasing growth of malicious code in recent years. Th... |
4 |
Shielding wireless sensor network using Markovian intrusion detection system with attack pattern mining
- Huang, Liao, et al.
- 2013
(Show Context)
Citation Context ...al with unknown malware that classic signature method cannot handle: anomaly detectors and data-miningbased detectors. These approaches have been also used in similar domains like intrusion detection =-=[39, 23, 77, 27]-=- Anomaly detectors use information retrieved from benign software to obtain a benign behaviour prole. Then, every significant deviation from this profile is qualified as suspicious. Li et al. [42] pr... |
3 | Detecting a malicious executable without prior knowledge of its patterns
- Cai, Theiler, et al.
- 2005
(Show Context)
Citation Context ...their structural (byte) composition. The main assumption behind this analysis is that benign files are composed of predictable regular byte structures for their respective types. Likewise, Cai et al. =-=[9]-=- used byte sequence frequencies detect malware. Their goal was to use only information about benign software to measure the deviations from the benign behaviour profile. Specifically, they applied a G... |
3 |
Instance reduction approach to machine learning and multi-database mining
- Czarnowski, Jedrzejowicz
- 2006
(Show Context)
Citation Context ...age requirements, increases time complexity and impairs the general accuracy of the models [10]. To reduce disproportionate storage and time costs, it is necessary to reduce the original training set =-=[19]-=-. In order to solve this issue, data reduction is normally considered an appropriate preprocessing optimisation technique [59, 78]. Such techniques have many potential advantages such as reducing meas... |
3 |
Filiol, Behavioral detection of malware: from a survey towards an established taxonomy
- Jacob, Debar, et al.
- 2008
(Show Context)
Citation Context ...ection due to the increasing growth of malware. Data mining approaches usually rely onmachine-learning algorithms that use both malicious executables and benign software to detect malware in the wild =-=[28, 29, 81, 70]-=-. Machine learning is a discipline within Articial Intelligence (AI) concerned with the design and development of algorithms that allow computers to reason and make decisions based on data [5]. Gener... |
3 |
Information gain and a general measure of correlation, Biometrika 70
- Kent
- 1983
(Show Context)
Citation Context ...ne and two. The number of features obtained with an opcode-sequence length of two and above was very high (see Figure 8). To deal with this, we applied a feature selection step using Information Gain =-=[34]-=- and we selected the top 1000 features, which represent the 0.6 % of the total number of features in the case of n = 2. On we obtained the reduced datasets, we tested the suitability of our proposed a... |
2 |
On the Concept of Software Obfuscation
- Kuzurin, Shokurov, et al.
- 2007
(Show Context)
Citation Context ...ues render the signature-based method less than completely reliable: it cannot cope with code obfuscations and cannot detect previously unseen malware. Malware writers use code obfuscation techniques =-=[40]-=- to hide the actual behaviour of their malicious creations [8, 84, 13, 33]. Examples of these obfuscation algorithms include garbage insertion, which consists on adding sequences which do not modify t... |
1 |
The effectiveness of anti-malware tools, Computer Fraud & Security 2009
- Potter, Day
- 2009
(Show Context)
Citation Context ...print submitted to Information Sciences August 22, 2011 1. Introduction Malware (or malicious software) is defined as computer software that has been explicitly designed to harm computers or networks =-=[58]-=-. In the past, malware creators were motivated mainly by fame or glory. Most current malware, however, is economically motivated [51]. Commercial anti-malware solutions rely on a signature database [4... |