Results 1 - 10
of
12
Improved Heterogeneous Distance Functions
- Journal of Artificial Intelligence Research
, 1997
"... Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores cont ..."
Abstract
-
Cited by 173 (9 self)
- Add to MetaCart
Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes. 1. Introduction Instance-Based Learning (IBL) (Aha, ...
Computer Algorithms for Plagiarism Detection
, 1989
"... This paper presents a survey of computer algorithms used for the detection of student plagiarism. A summary of several algo-rithms is provided. Common features of the different plagiarism detection algorithms are described. Ethical and administrative issues involving detected plagiarism are discuss ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
This paper presents a survey of computer algorithms used for the detection of student plagiarism. A summary of several algo-rithms is provided. Common features of the different plagiarism detection algorithms are described. Ethical and administrative issues involving detected plagiarism are discussed.
Application of machine learning algorithms to KDD intrusion detection dataset within misuse detection context
- In Proceedings of the International Conference on Machine Learning: Models, Technologies, and Applications
, 2003
"... A small subset of machine learning algorithms, mostly inductive learning based, applied to the KDD 1999 Cup intrusion detection dataset resulted in dismal performance for user-to-root and remote-to-local attack categories as reported in the recent literature. The uncertainty to explore if other mach ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
A small subset of machine learning algorithms, mostly inductive learning based, applied to the KDD 1999 Cup intrusion detection dataset resulted in dismal performance for user-to-root and remote-to-local attack categories as reported in the recent literature. The uncertainty to explore if other machine learning algorithms can demonstrate better performance compared to the ones already employed constitutes the motivation for the study reported herein. Specifically, exploration of if certain algorithms perform better for certain attack classes and consequently, if a multi-expert classifier design can deliver desired performance measure is of high interest. This paper evaluates performance of a comprehensive set of pattern recognition and machine learning algorithms on four attack categories as found in the KDD 1999 Cup intrusion detection dataset. Results of simulation study implemented to that effect indicated that certain classification algorithms perform better for certain attack categories: a specific algorithm specialized for a given
IGLUE: A Lattice-based Constructive Induction System.
- In: Intl. Journal of Intelligent Data Analysis (IDA
, 2001
"... A machine learning (ML) system which combines lattice-based and instance-based learning (IBL) techniques, is motivated and developed in this paper. We describe an IBL system over lattice theory called IGLUE that significantly improved both the complexity and accuracy of lattice-based learning system ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
A machine learning (ML) system which combines lattice-based and instance-based learning (IBL) techniques, is motivated and developed in this paper. We describe an IBL system over lattice theory called IGLUE that significantly improved both the complexity and accuracy of lattice-based learning systems. For this purpose, IGLUE uses the entropy function to select relevant lattice nodes, then extracts a set of new numerical features from the original set of boolean features, and finally applies a nearest neighbor technique with the Mahanalobis distance as the similarity measure between redescribed instances. IGLUE treats only domains described with symbolic features. In this paper, we present results of experiments we carried out to assess how well IGLUE performs on real problems, with other similarity measures and selection functions. We combine three selection functions and three similarity measures, and thus run nine experiments. We compare the performance of these combined st...
The Distribution Family of Similarity Distances
"... Assessing similarity between features is a key step in object recognition and scene categorization tasks. We argue that knowledge on the distribution of distances generated by similarity functions is crucial in deciding whether features are similar or not. Intuitively one would expect that similarit ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Assessing similarity between features is a key step in object recognition and scene categorization tasks. We argue that knowledge on the distribution of distances generated by similarity functions is crucial in deciding whether features are similar or not. Intuitively one would expect that similarities between features could arise from any distribution. In this paper, we will derive the contrary, and report the theoretical result that Lp-norms –a class of commonly applied distance metrics – from one feature vector to other vectors are Weibull-distributed if the feature values are correlated and non-identically distributed. Besides these assumptions being realistic for images, we experimentally show them to hold for various popular feature extraction algorithms, for a diverse range of images. This fundamental insight opens new directions in the assessment of feature similarity, with projected improvements in object and scene recognition algorithms. 1
Learning Similarity with Fuzzy Functions of Adaptable Complexity
- Symp. on Spatial and Temporal Databases (SSTD), LNCS
, 2003
"... A common approach in database queries involves the multidimensional representation of objects by a set of features. These features are compared to the query representation and then combined together to produce a total similarity metric. In this paper we introduce a novel technique for similarity lea ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A common approach in database queries involves the multidimensional representation of objects by a set of features. These features are compared to the query representation and then combined together to produce a total similarity metric. In this paper we introduce a novel technique for similarity learning within features (attributes) by manipulating fuzzy membership functions (FMFs) of different complexity. Our approach is based on a gradual complexity increase adaptable to problem requirements. The underlying idea is that less adaptable functions will act as approximations for more complex ones.
Formalising optimal feature weight setting in case based diagnosis as linear programming problems
- KNOWLEDGE-BASED SYSTEMS
, 2002
"... ..."
Fyang---Parekh---Honavarg@cs.iastate.edu
, 1997
"... Multi-layer networks of threshold logic units offer an attractive framework for the design of pattern classification systems. A new constructive neural network learning algorithm (DistAl) based on inter-pattern distance is introduced. DistAl constructs a single hidden layer of hyperspherical thresho ..."
Abstract
- Add to MetaCart
Multi-layer networks of threshold logic units offer an attractive framework for the design of pattern classification systems. A new constructive neural network learning algorithm (DistAl) based on inter-pattern distance is introduced. DistAl constructs a single hidden layer of hyperspherical threshold neurons. Each neuron is designed to exclude a cluster of training patterns belonging to the same class. The weights and thresholds of the hidden neurons are determined directly by comparing the interpattern distances of the training patterns. This offers a significant advantage over other constructive learning algorithms that use an iterative (and often time consuming) weight modification strategy to train individual neurons. The individual clusters (represented by the hidden neurons) are combined by a single output layer of threshold neurons. The speed of DistAl makes it a good candidate for datamining and knowledge acquisition from very large datasets. The paper presents results of expe...

