Results 1 - 10
of
129
The jackknife—a review.
- Biometrika
, 1974
"... The Light Beyond, By Raymond A. Moody, Jr. with Paul Perry. New York, NY: Bantam Books, 1988, 161 pp., $18.95 In his foreword to this book, Andrew Greeley, a prominent priest and sociologist, introduces his comments with the following statement: "Raymond Moody has achieved a rare feat in th ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
(Show Context)
The Light Beyond, By Raymond A. Moody, Jr. with Paul Perry. New York, NY: Bantam Books, 1988, 161 pp., $18.95 In his foreword to this book, Andrew Greeley, a prominent priest and sociologist, introduces his comments with the following statement: "Raymond Moody has achieved a rare feat in the quest for human knowledge; he has created a paradigm." He then refers to Thomas Kuhn, who pointed out in The Structure of Scientific Revolutions that scientific revolutions occur when someone creates a new perspective, a new model, a new approach to reality. Although Greeley acknowledges that Moody did not discover the near-death experience (NDE), he contends that because Moody put a name to it in his previous bestseller Life After Life (1975), he therefore deserves credit for the new para digm that has evolved. Greeley then refers to The Light Beyond as characterized by Moody's "openness, sensitivity and modesty." This he attributes to Moody's acknowledgement that the NDE does not repre sent proof of life after death; rather, it indicates only the existence and widespread prevalence of the NDE. I must question why Greeley does not comment more on the content of the book, and why Moody felt it was appropriate to be credited with creating a new paradigm. During the last fourteen years since Life
Multi-criteria Reinforcement Learning
, 1998
"... We consider multi-criteria sequential decision making problems where the vector-valued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology int ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
We consider multi-criteria sequential decision making problems where the vector-valued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology introduced by pointwise convergence and the order-topology introduced by the preference order are in general incompatible. Reinforcement learning algorithms are proposed and analyzed. Preliminary computer experiments confirm the validity of the derived algorithms. It is observed that in the medium-term multicriteria RL often converges to better solutions (measured by the first criterion) than their single-criterion counterparts. These type of multicriteria problems are most useful when there are several optimal solutions to a problem and one wants to choose the one among these which is optimal according to another fixed criterion. Example applications include alternating games, when in addition...
Importance sampling for reinforcement learning with multiple objectives
, 2001
"... OFTECHNOLOGY hairman, ..."
Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies
, 2008
"... Machine learning is inherently a multiobjective task. Traditionally, however, either only one of the objectives is adopted as the cost function or multiple objectives are aggregated to a scalar cost function. This can be mainly attributed to the fact that most conventional learning algorithms can o ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
Machine learning is inherently a multiobjective task. Traditionally, however, either only one of the objectives is adopted as the cost function or multiple objectives are aggregated to a scalar cost function. This can be mainly attributed to the fact that most conventional learning algorithms can only deal with a scalar cost function. Over the last decade, efforts on solving machine learning problems using the Pareto-based multiobjective optimization methodology have gained increasing impetus, particularly due to the great success of multiobjective optimization using evolutionary algorithms and other population-based stochastic search methods. It has been shown that Pareto-based multiobjective learning approaches are more powerful compared to learning algorithms with a scalar cost function in addressing various topics of machine learning, such as clustering, feature selection, improvement of generalization ability, knowledge extraction, and ensemble generation. One common benefit of the different multiobjective learning approaches is that a deeper insight into the learning problem can be gained by analyzing the Pareto front composed of multiple Pareto-optimal solutions. This paper provides an overview of the existing research on multiobjective machine learning, focusing on supervised learning. In addition, a number of case studies are provided to illustrate the major benefits of the Pareto-based approach to machine learning, e.g., how to identify interpretable models and models that can generalize on unseen data from the obtained Pareto-optimal solutions. Three approaches to Pareto-based multiobjective ensemble generation are compared and discussed in detail. Finally, potentially interesting topics in multiobjective machine learning are suggested.
Modelling transition dynamics in mdps with rkhs embeddings
- In arXiv
, 2012
"... We propose a new, nonparametric approach to learning and representing transition dynamics in Markov decision processes (MDPs), which can be combined easily with dynamic programming methods for policy optimisation and value estimation. This approach makes use of a recently developed representation of ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
(Show Context)
We propose a new, nonparametric approach to learning and representing transition dynamics in Markov decision processes (MDPs), which can be combined easily with dynamic programming methods for policy optimisation and value estimation. This approach makes use of a recently developed representation of conditional distributions as embeddings in a reproducing kernel Hilbert space (RKHS). Such representations bypass the need for estimating transition probabilities or densities, and apply to any domain on which kernels can be defined. This avoids the need to calculate intractable integrals, since expectations are represented as RKHS inner products whose computation has linear complexity in the number of points used to represent the embedding. We provide guarantees for the proposed applications in MDPs: in the context of a value iteration algorithm, we prove convergence to either the optimal policy, or to the closest projection of the optimal policy in our model class (an RKHS), under reasonable assumptions. In experiments, we investigate a learning task in a typical classical control setting (the under-actuated pendulum), and on a navigation problem where only images from a sensor are observed. For policy optimisation we compare with leastsquares policy iteration where a Gaussian process is used for value function estimation. For value estimation we also compare to the NPDP method. Our approach achieves better performance in all experiments.
Dynamic Preferences in Multi-Criteria Reinforcement Learning
- In Proceedings of ICML-05
, 2005
"... The current framework of reinforcement learning is based on maximizing the expected returns based on scalar rewards. But in many real world situations, tradeoffs must be made among multiple objectives. Moreover, the agent’s preferences between different objectives may vary with time. In this paper, ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
(Show Context)
The current framework of reinforcement learning is based on maximizing the expected returns based on scalar rewards. But in many real world situations, tradeoffs must be made among multiple objectives. Moreover, the agent’s preferences between different objectives may vary with time. In this paper, we consider the problem of learning in the presence of time-varying preferences among multiple objectives, using numeric weights to represent their importance. We propose a method that allows us to store a finite number of policies, choose an appropriate policy for any weight vector and improve upon it. The idea is that although there are infinitely many weight vectors, they may be well-covered by a small number of optimal policies. We show this empirically in two domains: a version of the Buridan’s ass problem and network routing. 1.
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
"... Bayesian learning methods have recently been shown to provide an elegant solution to the explorationexploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of th ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Bayesian learning methods have recently been shown to provide an elegant solution to the explorationexploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of partially observable domains, by introducing the Bayes-Adaptive Partially Observable Markov Decision Processes. This new framework can be used to simultaneously (1) learn a model of the POMDP domain through interaction with the environment, (2) track the state of the system under partial observability, and (3) plan (near-)optimal sequences of actions. An important contribution of this paper is to provide theoretical results showing how the model can be finitely approximated while preserving good learning performance. We present approximate algorithms for belief tracking and planning in this model, as well as empirical results that illustrate how the model estimate and agent’s return improve as a function of experience. Keywords: processes reinforcement learning, Bayesian inference, partially observable Markov decision 1.
Optimal approximate dynamic programming algorithms for a general class of storage problems
, 2007
"... informs doi 10.1287/moor.1080.0360 ..."
(Show Context)
Reinforcement Learning with Bounded Risk
- In Proceedings of the Eighteenth International Conference on Machine Learning
, 2001
"... In this paper, we consider nite MDPs with fatal states. We dene the risk under a policy as the probability of entering a fatal state, which is dierent to the notion of risk normally used in DP and RL (most often regarding the variance of the return). We consider the problem of nding optimal po ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
In this paper, we consider nite MDPs with fatal states. We dene the risk under a policy as the probability of entering a fatal state, which is dierent to the notion of risk normally used in DP and RL (most often regarding the variance of the return). We consider the problem of nding optimal policies with bounded risk, i.e. where the risk is smaller than some user specied threshold !, and formalize it as a constrained MDP with two innite horizon criteria { a discounted one for the value of a state and an undiscounted criterion for the risk. We dene a heuristic, model free reinforcement learning algorithm that nds good deterministic policies for the constrained problem. The algorithm is based on an abstract ordering of the multi-dimensional return space. It uses a weighted formulation of the problem. The internal weight parameter is adjusted by an heuristic optimization algorithm. 1.