Results 1  10
of
79
Mining DistanceBased Outliers in Near Linear Time with Randomization and a Simple Pruning Rule
, 2003
"... Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic ..."
Abstract

Cited by 159 (4 self)
 Add to MetaCart
Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real highdimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the e#ciency is because the time to process nonoutliers, which are the majority of examples, does not depend on the size of the data set.
Topdown induction of clustering trees
 In 15th Int’l Conf. on Machine Learning
, 1998
"... An approach to clustering is presented that adapts the basic topdown induction of decision trees method towards clustering. To this aim, it employs the principles of instance based learning. The resulting methodology is implemented in the TIC (Top down Induction of Clustering trees) system for firs ..."
Abstract

Cited by 152 (29 self)
 Add to MetaCart
An approach to clustering is presented that adapts the basic topdown induction of decision trees method towards clustering. To this aim, it employs the principles of instance based learning. The resulting methodology is implemented in the TIC (Top down Induction of Clustering trees) system for first order clustering. The TIC system employs the first order logical decision tree representation of the inductive logic programming system Tilde. Various experiments with TIC are presented, in both propositional and relational domains. 1
A Simple Relational Classifier
 Proceedings of the Second Workshop on MultiRelational Data Mining (MRDM2003) at KDD2003
, 2003
"... We analyze a Relational Neighbor (RN) classifier, a simple relational predictive model that predicts only based on class labels of related neighbors, using no learning and no inherent attributes. We show that it performs surprisingly well by comparing it to more complex models such as Probabilist ..."
Abstract

Cited by 111 (12 self)
 Add to MetaCart
(Show Context)
We analyze a Relational Neighbor (RN) classifier, a simple relational predictive model that predicts only based on class labels of related neighbors, using no learning and no inherent attributes. We show that it performs surprisingly well by comparing it to more complex models such as Probabilistic Relational Models and Relational Probability Trees on three data sets from published work.
Kernels and Distances for Structured Data
 Machine Learning
, 2004
"... This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higherorder logic. Our main theo ..."
Abstract

Cited by 66 (3 self)
 Add to MetaCart
(Show Context)
This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higherorder logic. Our main theoretical result is the positive definiteness of any kernel thus defined. We report encouraging experimental results on a range of realworld datasets. By converting our kernel to a distance pseudometric for 1nearest neighbour, we were able to improve the best accuracy from the literature on the Diterpene dataset by more than 10%.
MultiRelational Data Mining: An Introduction
"... Data mining algorithms look for patterns in data. While most existing data mining approaches look for patterns in a single data table, multirelational data mining (MRDM) approaches look for patterns that involve multiple tables ..."
Abstract

Cited by 64 (0 self)
 Add to MetaCart
Data mining algorithms look for patterns in data. While most existing data mining approaches look for patterns in a single data table, multirelational data mining (MRDM) approaches look for patterns that involve multiple tables
A Polynomial Time Computable Metric Between Point Sets
, 2000
"... Measuring the similarity or distance between two sets of points in a metric space is an important problem in machine learning and has also applications in other disciplines e.g. in computational geometry, philosophy of science, methods for updating or changing theories, . . . . Recently Eiter and Ma ..."
Abstract

Cited by 54 (5 self)
 Add to MetaCart
Measuring the similarity or distance between two sets of points in a metric space is an important problem in machine learning and has also applications in other disciplines e.g. in computational geometry, philosophy of science, methods for updating or changing theories, . . . . Recently Eiter and Mannila have proposed a new measure which is computable in polynomial time. However, it is not a distance function in the mathematical sense because it does not satisfy the triangle inequality.
Clustering ontologybased metadata in the semantic web
 Proceedings of 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD
, 2002
"... Abstract The Semantic Web is an extension of the current web in which information is given welldefined meaning, better enabling computers and people to work in cooperation. Recently, different applications based on this vision have been designed, e.g. in the fields of knowledge management, communi ..."
Abstract

Cited by 46 (1 self)
 Add to MetaCart
(Show Context)
Abstract The Semantic Web is an extension of the current web in which information is given welldefined meaning, better enabling computers and people to work in cooperation. Recently, different applications based on this vision have been designed, e.g. in the fields of knowledge management, community web portals, elearning, multimedia retrieval, etc. It is obvious that the complex metadata descriptions generated on the basis of predefined ontologies serve as perfect input data for machine learning techniques. In this paper we propose an approach for clustering ontologybased metadata. Main contributions of this paper are the definition of a set of similarity measures for comparing ontologybased metadata and an application study using these measures within a hierarchical clustering algorithm. 1
Distance between herbrand interpretations: A measure for approximations to a target concept
 in: Proceedings of the 7th International Workshop on Inductive Logic Programming
, 1997
"... ..."
The omnipresence of casebased reasoning in science and application
 KNOWLEDGEBASED SYSTEMS
, 1998
"... A surprisingly large number of research disciplines have contributed towards the development of knowledge on lazy problem solving, which is characterized by its storage of ground cases and its demand driven response to queries. Casebased reasoning (CBR) is an alternative, increasingly popular appro ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
A surprisingly large number of research disciplines have contributed towards the development of knowledge on lazy problem solving, which is characterized by its storage of ground cases and its demand driven response to queries. Casebased reasoning (CBR) is an alternative, increasingly popular approach for designing expert systems that implements this approach. This paper lists pointers to some contributions in some related disciplines that offer insights for CBR research. We then outline a small number of Navy applications based on this approach that demonstrate its breadth of applicability. Finally, we list a few successful and failed attempts to apply CBR, and list some predictions on the future roles of CBR in applications.
Topdown induction of logical decision trees
 Artificial Intelligence
, 1998
"... Topdown induction of decision trees (TDIDT) is a very popular machine learning technique. Up till now, it has mainly been used for propositional learning, but seldomly for relational learning or inductive logic programming. The main contribution of this paper is the introduction of logical decision ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
(Show Context)
Topdown induction of decision trees (TDIDT) is a very popular machine learning technique. Up till now, it has mainly been used for propositional learning, but seldomly for relational learning or inductive logic programming. The main contribution of this paper is the introduction of logical decision trees, which make it possible to use TDIDT in inductive logic programming. An implementation of this topdown induction of logical decision trees, the Tilde system, is presented and experimentally evaluated. 1