Results 1  10
of
107
The Relationship Between PrecisionRecall and ROC Curves
 In ICML ’06: Proceedings of the 23rd international conference on Machine learning
, 2006
"... Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, PrecisionRecall (PR) curves give a more informative picture of an algorithm’s performance. We show that a deep conn ..."
Abstract

Cited by 415 (4 self)
 Add to MetaCart
(Show Context)
Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, PrecisionRecall (PR) curves give a more informative picture of an algorithm’s performance. We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. Finally, we also note differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between points. Furthermore, algorithms that optimize the area under the ROC curve are not guaranteed to optimize the area under the PR curve. 1.
Learning from imbalanced data
 IEEE Trans. on Knowledge and Data Engineering
, 2009
"... Abstract—With the continuous expansion of data availability in many largescale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decisionm ..."
Abstract

Cited by 260 (6 self)
 Add to MetaCart
(Show Context)
Abstract—With the continuous expansion of data availability in many largescale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decisionmaking processes. Although existing knowledge discovery and data engineering techniques have shown great success in many realworld applications, the problem of learning from imbalanced data (the imbalanced learning problem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced data sets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data efficiently into information and knowledge representation. In this paper, we provide a comprehensive review of the development of research in learning from imbalanced data. Our focus is to provide a critical review of the nature of the problem, the stateoftheart technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario. Furthermore, in order to stimulate future research in this field, we also highlight the major opportunities and challenges, as well as potential important research directions for learning from imbalanced data. Index Terms—Imbalanced learning, classification, sampling methods, costsensitive learning, kernelbased learning, active learning, assessment metrics. Ç
Sound and efficient inference with probabilistic and deterministic dependencies
, 2006
"... Reasoning with both probabilistic and deterministic dependencies is important for many realworld problems, and in particular for the emerging field of statistical relational learning. However, probabilistic inference methods like MCMC or belief propagation tend to give poor results when determin ..."
Abstract

Cited by 133 (17 self)
 Add to MetaCart
Reasoning with both probabilistic and deterministic dependencies is important for many realworld problems, and in particular for the emerging field of statistical relational learning. However, probabilistic inference methods like MCMC or belief propagation tend to give poor results when deterministic or neardeterministic dependencies are present, and logical ones like satisfiability testing are inapplicable to probabilistic ones. In this paper we propose MCSAT, an inference algorithm that combines ideas from MCMC and satisfiability. MCSAT is based on Markov logic, which defines Markov networks using weighted clauses in firstorder logic. From the point of view of MCMC,MCSAT is a slice sampler with an auxiliary variable per clause, and with a satisfiabilitybased method for sampling the original variables given the auxiliary ones. From the point of view of satisfiability, MCSAT wraps a procedure around the SampleSAT uniform sampler that enables it to sample from highly nonuniform distributions over satisfying assignments. Experiments on entity resolution and collective classification problems show that MCSAT greatly outperforms Gibbs sampling and simulated tempering over a broad range of problem sizes and degrees of determinism.
Joint inference in information extraction
 In Proceedings of the 22nd National Conference on Artificial Intelligence (2007
"... The goal of information extraction is to extract database records from text or semistructured sources. Traditionally, information extraction proceeds by first segmenting each candidate record separately, and then merging records that refer to the same entities. While computationally efficient, this ..."
Abstract

Cited by 119 (8 self)
 Add to MetaCart
(Show Context)
The goal of information extraction is to extract database records from text or semistructured sources. Traditionally, information extraction proceeds by first segmenting each candidate record separately, and then merging records that refer to the same entities. While computationally efficient, this approach is suboptimal, because it ignores the fact that segmenting one candidate record can help to segment similar ones. For example, resolving a wellsegmented field with a lessclear one can disambiguate the latter’s boundaries. In this paper we propose a joint approach to information extraction, where segmentation of all records and entity resolution are performed together in a single integrated inference process. While a number of previous authors have taken steps in this direction (e.g., Pasula et al. (2003), Wellner et al. (2004)), to our knowledge this is the first fully joint approach. In experiments on the CiteSeer and Cora citation matching datasets, joint inference improved accuracy, and our approach outperformed previous ones. Further, by using Markov logic and the existing algorithms for it, our solution consisted mainly of writing the appropriate logical formulas, and required much less engineering than previous ones.
Learning the structure of Markov logic networks
 In Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive l ..."
Abstract

Cited by 116 (21 self)
 Add to MetaCart
Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. The algorithm performs a beam or shortestfirst search of the space of clauses, guided by a weighted pseudolikelihood measure. This requires computing the optimal weights for each candidate structure, but we show how this can be done efficiently. The algorithm can be used to learn an MLN from scratch, or to refine an existing knowledge base. We have applied it in two realworld domains, and found that it outperforms using offtheshelf ILP systems to learn the MLN structure, as well as pure ILP, purely probabilistic and purely knowledgebased approaches. 1.
Lifted firstorder belief propagation
 In Association for the Advancement of Artificial Intelligence (AAAI
, 2008
"... Unifying firstorder logic and probability is a longstanding goal of AI, and in recent years many representations combining aspects of the two have been proposed. However, inference in them is generally still at the level of propositional logic, creating all ground atoms and formulas and applying s ..."
Abstract

Cited by 115 (15 self)
 Add to MetaCart
Unifying firstorder logic and probability is a longstanding goal of AI, and in recent years many representations combining aspects of the two have been proposed. However, inference in them is generally still at the level of propositional logic, creating all ground atoms and formulas and applying standard probabilistic inference methods to the resulting network. Ideally, inference should be lifted as in firstorder logic, handling whole sets of indistinguishable objects together, in time independent of their cardinality. Poole (2003) and Braz et al. (2005, 2006) developed a lifted version of the variable elimination algorithm, but it is extremely complex, generally does not scale to realistic domains, and has only been applied to very small artificial problems. In this paper we propose the first lifted version of a scalable probabilistic inference algorithm, belief propagation (loopy or not). Our approach is based on first constructing a lifted network, where each node represents a set of ground atoms that all pass the same messages during belief propagation. We then run belief propagation on this network. We prove the correctness and optimality of our algorithm. Experiments show that it can greatly reduce the cost of inference.
Entity Resolution with Markov Logic
 In ICDM
, 2006
"... Entity resolution is the problem of determining which records in a database refer to the same entities, and is a crucial and expensive step in the data mining process. Interest in it has grown rapidly in recent years, and many approaches have been proposed. However, they tend to address only isolate ..."
Abstract

Cited by 105 (10 self)
 Add to MetaCart
(Show Context)
Entity resolution is the problem of determining which records in a database refer to the same entities, and is a crucial and expensive step in the data mining process. Interest in it has grown rapidly in recent years, and many approaches have been proposed. However, they tend to address only isolated aspects of the problem, and are often ad hoc. This paper proposes a wellfounded, integrated solution to the entity resolution problem based on Markov logic. Markov logic combines firstorder logic and probabilistic graphical models by attaching weights to firstorder formulas, and viewing them as templates for features of Markov networks. We show how a number of previous approaches can be formulated and seamlessly combined in Markov logic, and how the resulting learning and inference problems can be solved efficiently. Experiments on two citation databases show the utility of this approach, and evaluate the contribution of the different components. 1
Automatically Refining the Wikipedia Infobox Ontology
, 2008
"... The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machineharvestable objectattributevalue triples in Wikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more se ..."
Abstract

Cited by 102 (7 self)
 Add to MetaCart
(Show Context)
The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machineharvestable objectattributevalue triples in Wikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more semantic knowledge from natural language text. But in order to realize the full power of this information, it must be situated in a cleanlystructured ontology. This paper introduces KOG, an autonomous system for refining Wikipedia’s infoboxclass ontology towards this end. We cast the problem of ontology refinement as a machine learning problem and solve it using both SVMs and a more powerful jointinference approach expressed in Markov Logic Networks. We present experiments demonstrating the superiority of the jointinference approach and evaluating other aspects of our system. Using these techniques, we build a rich ontology, integrating Wikipedia’s infoboxclass schemata with WordNet. We demonstrate how the resulting ontology may be used to enhance Wikipedia with improved query processing and other features.
Efficient weight learning for Markov logic networks
 In Proceedings of the Eleventh European Conference on Principles and Practice of Knowledge Discovery in Databases
, 2007
"... Abstract. Markov logic networks (MLNs) combine Markov networks and firstorder logic, and are a powerful and increasingly popular representation for statistical relational learning. The stateoftheart method for discriminative learning of MLN weights is the voted perceptron algorithm, which is ess ..."
Abstract

Cited by 87 (7 self)
 Add to MetaCart
(Show Context)
Abstract. Markov logic networks (MLNs) combine Markov networks and firstorder logic, and are a powerful and increasingly popular representation for statistical relational learning. The stateoftheart method for discriminative learning of MLN weights is the voted perceptron algorithm, which is essentially gradient descent with an MPE approximation to the expected sufficient statistics (true clause counts). Unfortunately, these can vary widely between clauses, causing the learning problem to be highly illconditioned, and making gradient descent very slow. In this paper, we explore several alternatives, from perweight learning rates to secondorder methods. In particular, we focus on two approaches that avoid computing the partition function: diagonal Newton and scaled conjugate gradient. In experiments on standard SRL datasets, we obtain orderofmagnitude speedups, or more accurate models given comparable learning times. 1
Firstorder probabilistic models for coreference resolution
 In HLT/NAACL
, 2007
"... Traditional noun phrase coreference resolution systems represent features only of pairs of noun phrases. In this paper, we propose a machine learning method that enables features over sets of noun phrases, resulting in a firstorder probabilistic model for coreference. We outline a set of approximat ..."
Abstract

Cited by 86 (20 self)
 Add to MetaCart
(Show Context)
Traditional noun phrase coreference resolution systems represent features only of pairs of noun phrases. In this paper, we propose a machine learning method that enables features over sets of noun phrases, resulting in a firstorder probabilistic model for coreference. We outline a set of approximations that make this approach practical, and apply our method to the ACE coreference dataset, achieving a 45 % error reduction over a comparable method that only considers features of pairs of noun phrases. This result demonstrates an example of how a firstorder logic representation can be incorporated into a probabilistic model and scaled efficiently. 1