Results 1 - 10
of
815
A Survey on Transfer Learning
"... A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task i ..."
Abstract
-
Cited by 459 (24 self)
- Add to MetaCart
(Show Context)
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as co-variate shift. We also explore some potential future issues in transfer learning research.
Classification in Networked Data: A toolkit and a univariate case study
, 2006
"... This paper is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning resear ..."
Abstract
-
Cited by 200 (10 self)
- Add to MetaCart
This paper is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple network-classification models perform quite well—well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—i.e., Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection.
Collective classification in network data
, 2008
"... Numerous real-world applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification te ..."
Abstract
-
Cited by 178 (32 self)
- Add to MetaCart
(Show Context)
Numerous real-world applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such data. In this report, we attempt to provide a brief introduction to this area of research and how it has progressed during the past decade. We introduce four of the most widely used inference algorithms for classifying networked data and empirically compare them on both synthetic and real-world data.
ProbLog: a probabilistic Prolog and its application in link discovery
- In Proceedings of 20th International Joint Conference on Artificial Intelligence
, 2007
"... We introduce ProbLog, a probabilistic extension of Prolog. A ProbLog program defines a distribution over logic programs by specifying for each clause the probability that it belongs to a randomly sampled program, and these probabilities are mutually independent. The semantics of ProbLog is then defi ..."
Abstract
-
Cited by 144 (27 self)
- Add to MetaCart
We introduce ProbLog, a probabilistic extension of Prolog. A ProbLog program defines a distribution over logic programs by specifying for each clause the probability that it belongs to a randomly sampled program, and these probabilities are mutually independent. The semantics of ProbLog is then defined by the success probability of a query, which corresponds to the probability that the query succeeds in a randomly sampled program. The key contribution of this paper is the introduction of an effective solver for computing success probabilities. It essentially combines SLD-resolution with methods for computing the probability of Boolean formulae. Our implementation further employs an approximation algorithm that combines iterative deepening with binary decision diagrams. We report on experiments in the context of discovering links in real biological networks, a demonstration of the practical usefulness of the approach. 1
Sound and efficient inference with probabilistic and deterministic dependencies
, 2006
"... Reasoning with both probabilistic and deterministic depen-dencies is important for many real-world problems, and in particular for the emerging field of statistical relational learn-ing. However, probabilistic inference methods like MCMC or belief propagation tend to give poor results when deter-min ..."
Abstract
-
Cited by 133 (17 self)
- Add to MetaCart
(Show Context)
Reasoning with both probabilistic and deterministic depen-dencies is important for many real-world problems, and in particular for the emerging field of statistical relational learn-ing. However, probabilistic inference methods like MCMC or belief propagation tend to give poor results when deter-ministic or near-deterministic dependencies are present, and logical ones like satisfiability testing are inapplicable to prob-abilistic ones. In this paper we propose MC-SAT, an infer-ence algorithm that combines ideas from MCMC and satis-fiability. MC-SAT is based on Markov logic, which defines Markov networks using weighted clauses in first-order logic. From the point of view of MCMC,MC-SAT is a slice sampler with an auxiliary variable per clause, and with a satisfiability-based method for sampling the original variables given the auxiliary ones. From the point of view of satisfiability, MC-SAT wraps a procedure around the SampleSAT uniform sam-pler that enables it to sample from highly non-uniform distri-butions over satisfying assignments. Experiments on entity resolution and collective classification problems show that MC-SAT greatly outperforms Gibbs sampling and simulated tempering over a broad range of problem sizes and degrees of determinism.
Lifted first-order probabilistic inference
- In Proceedings of IJCAI-05, 19th International Joint Conference on Artificial Intelligence
, 2005
"... Most probabilistic inference algorithms are specified and processed on a propositional level. In the last decade, many proposals for algorithms accepting first-order specifications have been presented, but in the inference stage they still operate on a mostly propositional representation level. [Poo ..."
Abstract
-
Cited by 126 (8 self)
- Add to MetaCart
Most probabilistic inference algorithms are specified and processed on a propositional level. In the last decade, many proposals for algorithms accepting first-order specifications have been presented, but in the inference stage they still operate on a mostly propositional representation level. [Poole, 2003] presented a method to perform inference directly on the first-order level, but this method is limited to special cases. In this paper we present the first exact inference algorithm that operates directly on a first-order level, and that can be applied to any first-order model (specified in a language that generalizes undirected graphical models). Our experiments show superior performance in comparison with propositional exact inference. 1
Extracting places and activities from gps traces using hierarchical conditional random fields
- International Journal of Robotics Research
, 2007
"... Learning patterns of human behavior from sensor data is extremely important for high-level activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent mod ..."
Abstract
-
Cited by 119 (3 self)
- Add to MetaCart
(Show Context)
Learning patterns of human behavior from sensor data is extremely important for high-level activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent model of a person’s activities and places. In contrast to existing techniques, our approach takes high-level context into account in order to detect the significant places of a person. Our experiments show significant improvements over existing techniques. Furthermore, they indicate that our system is able to robustly estimate a person’s activities using a model that is trained from data collected by other persons. 1
Joint inference in information extraction
- In Proceedings of the 22nd National Conference on Artificial Intelligence (2007
"... The goal of information extraction is to extract database records from text or semi-structured sources. Traditionally, information extraction proceeds by first segmenting each candidate record separately, and then merging records that refer to the same entities. While computationally efficient, this ..."
Abstract
-
Cited by 119 (8 self)
- Add to MetaCart
(Show Context)
The goal of information extraction is to extract database records from text or semi-structured sources. Traditionally, information extraction proceeds by first segmenting each candidate record separately, and then merging records that refer to the same entities. While computationally efficient, this approach is suboptimal, because it ignores the fact that segmenting one candidate record can help to segment similar ones. For example, resolving a well-segmented field with a lessclear one can disambiguate the latter’s boundaries. In this paper we propose a joint approach to information extraction, where segmentation of all records and entity resolution are performed together in a single integrated inference process. While a number of previous authors have taken steps in this direction (e.g., Pasula et al. (2003), Wellner et al. (2004)), to our knowledge this is the first fully joint approach. In experiments on the CiteSeer and Cora citation matching datasets, joint inference improved accuracy, and our approach outperformed previous ones. Further, by using Markov logic and the existing algorithms for it, our solution consisted mainly of writing the appropriate logical formulas, and required much less engineering than previous ones.
Learning the structure of Markov logic networks
- In Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... Markov logic networks (MLNs) combine logic and probability by attaching weights to first-order clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive l ..."
Abstract
-
Cited by 116 (21 self)
- Add to MetaCart
(Show Context)
Markov logic networks (MLNs) combine logic and probability by attaching weights to first-order clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. The algorithm performs a beam or shortestfirst search of the space of clauses, guided by a weighted pseudo-likelihood measure. This requires computing the optimal weights for each candidate structure, but we show how this can be done efficiently. The algorithm can be used to learn an MLN from scratch, or to refine an existing knowledge base. We have applied it in two real-world domains, and found that it outperforms using off-the-shelf ILP systems to learn the MLN structure, as well as pure ILP, purely probabilistic and purely knowledge-based approaches. 1.
Lifted first-order belief propagation
- In Association for the Advancement of Artificial Intelligence (AAAI
, 2008
"... Unifying first-order logic and probability is a long-standing goal of AI, and in recent years many representations combining aspects of the two have been proposed. However, inference in them is generally still at the level of propositional logic, creating all ground atoms and formulas and applying s ..."
Abstract
-
Cited by 115 (15 self)
- Add to MetaCart
(Show Context)
Unifying first-order logic and probability is a long-standing goal of AI, and in recent years many representations combining aspects of the two have been proposed. However, inference in them is generally still at the level of propositional logic, creating all ground atoms and formulas and applying standard probabilistic inference methods to the resulting network. Ideally, inference should be lifted as in first-order logic, handling whole sets of indistinguishable objects together, in time independent of their cardinality. Poole (2003) and Braz et al. (2005, 2006) developed a lifted version of the variable elimination algorithm, but it is extremely complex, generally does not scale to realistic domains, and has only been applied to very small artificial problems. In this paper we propose the first lifted version of a scalable probabilistic inference algorithm, belief propagation (loopy or not). Our approach is based on first constructing a lifted network, where each node represents a set of ground atoms that all pass the same messages during belief propagation. We then run belief propagation on this network. We prove the correctness and optimality of our algorithm. Experiments show that it can greatly reduce the cost of inference.