Results 1 - 10
of
499
A survey of techniques for internet traffic classification using machine learning
- Communications Surveys & Tutorials, IEEE
"... Abstract—The research community has begun looking for IP traffic classification techniques that do not rely on ‘well known’ TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification ..."
Abstract
-
Cited by 135 (4 self)
- Add to MetaCart
(Show Context)
Abstract—The research community has begun looking for IP traffic classification techniques that do not rely on ‘well known’ TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification and classification process. This survey paper looks at emerging research into the application of Machine Learning (ML) techniques to IP traffic classification- an inter-disciplinary blend of IP networking and data mining techniques. We provide context and motivation for the application of ML techniques to IP traffic classification, and review 18 significant works that cover the dominant period from 2004 to early 2007. These works are categorized and reviewed according to their choice of ML strategies and primary contributions to the literature. We also discuss a number of key requirements for the employment of ML-based traffic classifiers in operational IP networks, and qualitatively critique the extent to which the reviewed works meet these requirements. Open issues and challenges in the field are also discussed.
Extending faceted navigation to RDF data
- PROC. 5TH SEMANTIC WEB CONF.’, LNCS 4273
, 2006
"... ..."
(Show Context)
A survey of kernel and spectral methods for clustering,”
- Pattern Recognit.,
, 2008
"... Abstract Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a ..."
Abstract
-
Cited by 88 (5 self)
- Add to MetaCart
(Show Context)
Abstract Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and Neural Gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm.
Elimination Methods
, 2000
"... As pointed out by Duarte and Pyle (1), the two-dimensional (2D) η-θ plot is a Ramachandranlike diagram that can provide us a graphic representation of quantitatively distinct structural features for analyzing and modeling RNA three-dimensional (3D) structures. Particularly, they showed that on this ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
(Show Context)
As pointed out by Duarte and Pyle (1), the two-dimensional (2D) η-θ plot is a Ramachandranlike diagram that can provide us a graphic representation of quantitatively distinct structural features for analyzing and modeling RNA three-dimensional (3D) structures. Particularly, they showed that on this η-θ plot, clusters of nucleotides with similar η and θ pseudo-torsional angles have similar conformational properties and vice versa. To depict this η-θ plot, we prepared a dataset that includes non-redundant crystal structures with minimum resolution of 3.0 ˚A from the PDB database (2). This dataset finally contains 117 crystal RNA structures, particularly including 74 structures used by Wadley et al. (3), with 9,527 nucleotides in total. We then used AMIGOS that was developed by Duarte and Pyle (1) to calculate the η and θ pseudo-torsion angles for all non-terminal nucleotides (9,267 nt in total) from all RNA molecules in the above dataset and plotted these calculated pseudo-torsion angles on the axes of a 2D plot as illustrated in Figure 1. Instead of using the vector quantization (VQ) approach as done in our previous work (4), we here applied the so-called affinity propagation (AP) clustering algorithm, introduced by Frey and Dueck recently (5), to classify all the non-terminal nucleotides in our prepared
A robust method to count and locate audio sources in a stereophonic linear instantaneous mixture
- in ICA
, 2006
"... Abstract—We propose a method to count and estimate the mixing directions in an underdetermined multichannel mixture. The approach is based on the hypothesis that in the neighbourhood of some time-frequency points, only one source essentially contributes to the mixture: such time-frequency points can ..."
Abstract
-
Cited by 41 (10 self)
- Add to MetaCart
(Show Context)
Abstract—We propose a method to count and estimate the mixing directions in an underdetermined multichannel mixture. The approach is based on the hypothesis that in the neighbourhood of some time-frequency points, only one source essentially contributes to the mixture: such time-frequency points can provide robust local estimates of the corresponding source direction. At the core of our contribution is a statistical model to exploit a local confidence measure which detects the timefrequency regions where such robust information is available. A clustering algorithm called DEMIX is proposed to merge the information from all time-frequency regions according to their confidence level. So as to estimate the delays of anechoic mixtures and overcome the intrinsic ambiguities of phase unwrapping as met with DUET, we propose a technique similar to GCC-PHAT which is able to estimate delays that can largely exceed one sample. We propose an extensive experimental study which shows that the resulting method is more robust in conditions where all DUET-like comparable methods fail, that is in particular: a) when time-delays largely exceed one sample; b) when the source directions are very close. Index Terms—Blind source separation, multichannel audio, delay estimation, sparse component analysis, direction of arrival I.
Ensemble-based discriminant learning with boosting for face recognition
- 10, 2009 DRAFT IVP/713183.V2 23
, 2006
"... Abstract—In this paper, we propose a novel ensemble-based approach to boost performance of traditional Linear Discriminant Analysis (LDA)-based methods used in face recognition. The ensemble-based approach is based on the recently emerged technique known as “boosting. ” However, it is generally beli ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
(Show Context)
Abstract—In this paper, we propose a novel ensemble-based approach to boost performance of traditional Linear Discriminant Analysis (LDA)-based methods used in face recognition. The ensemble-based approach is based on the recently emerged technique known as “boosting. ” However, it is generally believed that boosting-like learning rules are not suited to a strong and stable learner such as LDA. To break the limitation, a novel weakness analysis theory is developed here. The theory attempts to boost a strong learner by increasing the diversity between the classifiers created by the learner, at the expense of decreasing their margins, so as to achieve a tradeoff suggested by recent boosting studies for a low generalization error. In addition, a novel distribution accounting for the pairwise class discriminant information is introduced for effective interaction between the booster and the LDA-based learner. The integration of all these methodologies proposed here leads to the novel ensemble-based discriminant learning approach, capable of taking advantage of both the boosting and LDA techniques. Promising experimental results obtained on various difficult face recognition scenarios demonstrate the effectiveness of the proposed approach. We believe that this work is especially beneficial in extending the boosting framework to accommodate general (strong/weak) learners. Index Terms—Boosting, face recognition (FR), linear discriminant analysis, machine learning, mixture of linear models, smallsample-size (SSS) problem, strong learner.
Swarm Intelligence Algorithms for Data Clustering
- IN SOFT COMPUTING FOR KNOWLEDGE DISCOVERY AND DATA MINING BOOK, PART IV
"... Clustering aims at representing large datasets by a fewer number of prototypes or clusters. It brings simplicity in modeling data and thus plays a central role in the process of knowledge discovery and data mining. Data mining tasks, in these days, require fast and accurate partitioning of huge da ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
(Show Context)
Clustering aims at representing large datasets by a fewer number of prototypes or clusters. It brings simplicity in modeling data and thus plays a central role in the process of knowledge discovery and data mining. Data mining tasks, in these days, require fast and accurate partitioning of huge datasets, which may come with a variety of attributes or features. This, in turn, imposes severe computational requirements on the relevant clustering techniques. A family of bio-inspired algorithms, well-known as Swarm Intelligence (SI) has recently emerged that meets these requirements and has successfully been applied to a number of real world clustering problems. This chapter explores the role of SI in clustering different kinds of datasets. It finally describes a new SI technique for partitioning any dataset into an optimal number of groups through one run of optimization. Computer simulations undertaken in this research have also been provided to demonstrate the effectiveness of the proposed algorithm.
A kernel-based two-class classifier for imbalanced data sets
- IEEE Trans Neural Netw. 2007; 18: 28–41. PMID: 17278459
"... ..."
Predicting Trust and Distrust in Social Networks
"... Abstract—As user-generated content and interactions have overtaken the web as the default mode of use, questions of whom and what to trust have become increasingly important. Fortunately, online social networks and social media have made it easy for users to indicate whom they trust and whom they do ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
(Show Context)
Abstract—As user-generated content and interactions have overtaken the web as the default mode of use, questions of whom and what to trust have become increasingly important. Fortunately, online social networks and social media have made it easy for users to indicate whom they trust and whom they do not. However, this does not solve the problem since each user is only likely to know a tiny fraction of other users; we must have methods for inferring trust- and distrust- between users who do not know one another. In this paper, we present a new method for computing both trust and distrust (i.e., positive and negative trust). We do this by combining an inference algorithm that relies on a probabilistic interpretation of trust based on random graphs with a modified spring-embedding algorithm. Our algorithm correctly classifies hidden trust edges as positive or negative with high accuracy. These results are useful in a wide range of social web applications where trust is important to user behavior and satisfaction. I.