Results 1  10
of
1,185
A Bayesian method for the induction of probabilistic networks from data
 MACHINE LEARNING
, 1992
"... This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computerassisted hypothesis testing, automated scientific discovery, and automated construction of probabili ..."
Abstract

Cited by 1381 (32 self)
 Add to MetaCart
This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computerassisted hypothesis testing, automated scientific discovery, and automated construction of probabilistic expert systems. We extend the basic method to handle missing data and hidden (latent) variables. We show how to perform probabilistic inference by averaging over the inferences of multiple belief networks. Results are presented of a preliminary evaluation of an algorithm for constructing a belief network from a database of cases. Finally, we relate the methods in this paper to previous work, and we discuss open problems.
Path Planning Using Lazy PRM
 In IEEE Int. Conf. Robot. & Autom
, 2000
"... This paper describes a new approach to probabilistic roadmap planners (PRMs). The overall theme of the algorithm, called Lazy PRM, is to minimize the number of collision checks performed during planning and hence minimize the running time of the planner. Our algorithm builds a roadmap in the configu ..."
Abstract

Cited by 247 (18 self)
 Add to MetaCart
(Show Context)
This paper describes a new approach to probabilistic roadmap planners (PRMs). The overall theme of the algorithm, called Lazy PRM, is to minimize the number of collision checks performed during planning and hence minimize the running time of the planner. Our algorithm builds a roadmap in the configuration space, whose nodes are the userdefined initial and goal configurations and a number of randomly generated nodes. Neighboring nodes are connected by edges representing paths between the nodes. In contrast with PRMs, our planner initially assumes that all nodes and edges in the roadmap are collisionfree, and searches the roadmap at hand for a shortest path between the initial and the goal node. The nodes and edges along the path are then checked for collision. If a collision with the obstacles occurs, the corresponding nodes and edges are removed from the roadmap. Our planner either finds a new shortest path, or first updates the roadmap with new nodes and edges, and then searches for a shortest path. The above process is repeated until a collisionfree path is returned.
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirtythree Old and New Classification Algorithms
, 2000
"... . Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both cr ..."
Abstract

Cited by 225 (8 self)
 Add to MetaCart
. Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, splinebased, algorithm called Polyclass at the top, although it is not statistically signicantly dierent from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fth, respectively. Although splinebased statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algor...
F.: Postural hand synergies for tool use
 The Journal of Neuroscience
, 1998
"... Subjects were asked to shape the right hand as if to grasp and use a large number of familiar objects. The chosen objects typically are held with a variety of grips, including “precision” and “power ” grips. Static hand posture was measured by recording the angular position of 15 joint angles of the ..."
Abstract

Cited by 133 (3 self)
 Add to MetaCart
Subjects were asked to shape the right hand as if to grasp and use a large number of familiar objects. The chosen objects typically are held with a variety of grips, including “precision” and “power ” grips. Static hand posture was measured by recording the angular position of 15 joint angles of the fingers and of the thumb. Although subjects adopted distinct hand shapes for the various objects, the joint angles of the digits did not vary independently. Principal components analysis showed that the first two components could account for �80 % of the variance, implying a substantial reduction from the 15 degrees of freedom that were recorded. However, even though they were small, higherorder (more than three) principal components did not represent random variability but instead provided additional information about the object. These results suggest that the control of hand posture involves a few postural synergies,
Portfolio Selection with Parameter and Model Uncertainty: A MultiPrior Approach
, 2006
"... We develop a model for an investor with multiple priors and aversion to ambiguity. We characterize the multiple priors by a "confidence interval" around the estimated expected returns and we model ambiguity aversion via a minimization over the priors. Our model has several attractive featu ..."
Abstract

Cited by 108 (4 self)
 Add to MetaCart
We develop a model for an investor with multiple priors and aversion to ambiguity. We characterize the multiple priors by a "confidence interval" around the estimated expected returns and we model ambiguity aversion via a minimization over the priors. Our model has several attractive features: (1) it has a solid axiomatic foundation; (2) it is flexible enough to allow for different degrees of uncertainty about expected returns for various subsets of assets and also about the returngenerating model; and (3) it delivers closedform expressions for the optimal portfolio. Our empirical analysis suggests that, compared with portfolios from classical and Bayesian models, ambiguityaverse portfolios are more stable over time and deliver a higher outof sample Sharpe ratio.
Performance Evaluation of Some Clustering Algorithms and Validity Indices
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... Abstract—In this article, we evaluate the performance of three clustering algorithms, hard KMeans, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely DaviesBouldin index, Dunn’s index, CalinskiHarabasz index, and a recently de ..."
Abstract

Cited by 104 (2 self)
 Add to MetaCart
(Show Context)
Abstract—In this article, we evaluate the performance of three clustering algorithms, hard KMeans, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely DaviesBouldin index, Dunn’s index, CalinskiHarabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn’s index, a lower bound of the value of the former is theoretically estimated in order to get unique hard Kpartition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and reallife data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SAbased clustering technique is used for proper partitioning of the data into the said number of clusters.
Statistically rigorous Java performance evaluation
 In Proceedings of the ACM SIGPLAN Conference on ObjectOriented Programming, Systems, Languages, and Applications (OOPSLA
, 2007
"... Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, nondeterminism at runtime causes the execution time of a Java program to differ fr ..."
Abstract

Cited by 103 (5 self)
 Add to MetaCart
(Show Context)
Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, nondeterminism at runtime causes the execution time of a Java program to differ from run to run. There are a number of sources of nondeterminism such as JustInTime (JIT) compilation and optimization in the virtual machine (VM) driven by timerbased method sampling, thread scheduling, garbage collection, and various system effects. There exist a wide variety of Java performance evaluation methodologies used by researchers and benchmarkers. These methodologies differ from each other in a number of ways. Some report average performance over a number of runs of the same experiment; others report the best or second best performance observed; yet others report the worst. Some iterate the benchmark multiple times within a single VM invocation; others consider multiple VM invocations and iterate a single benchmark execution; yet others consider multiple VM invocations and iterate the benchmark multiple times. This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with nondeterminism. We advocate approaches to quantify startup as well as steadystate performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.
CLICK and EXPANDER: a system for clustering and visualizing gene expression data
 Bioinformatics
, 2003
"... Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar exp ..."
Abstract

Cited by 101 (6 self)
 Add to MetaCart
(Show Context)
Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. Results: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graphtheoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of coregulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new javabased graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms.
A Linear Method for Deviation Detection in Large Databases
, 1996
"... We describe the problem of finding deviations in large data bases. Normally, explicit information outside the data, like integrity constraints or predefined patterns, is used for deviation detection. In contrast, we approach the problem from the inside of the data, using the implicit redundancy of t ..."
Abstract

Cited by 100 (1 self)
 Add to MetaCart
(Show Context)
We describe the problem of finding deviations in large data bases. Normally, explicit information outside the data, like integrity constraints or predefined patterns, is used for deviation detection. In contrast, we approach the problem from the inside of the data, using the implicit redundancy of the data. We give a formal description of the problem and present a linear algorithm for detecting deviations. Our solution simulates a mechanism familiar to human beings: after seeing a series of similar data, an element disturbing the series is considered an exception. We also present experimental results from the application of this algorithm on reallife datasets showing its effectiveness.
Asymptotically Optimal Importance Sampling and Stratification for Pricing PathDependent Options
 Mathematical Finance
, 1999
"... This paper develops a variance reduction technique for Monte Carlo simulations of pathdependent options driven by highdimensional Gaussian vectors. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of dri ..."
Abstract

Cited by 90 (13 self)
 Add to MetaCart
This paper develops a variance reduction technique for Monte Carlo simulations of pathdependent options driven by highdimensional Gaussian vectors. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of drift is selected through a large deviations analysis and is shown to be optimal in an asymptotic sense. The drift selected has an interpretation as the path of the underlying state variables which maximizes the product of probability and payoffthe most important path. The directions used for stratified sampling are optimal for a quadratic approximation to the integrand or payoff function. Indeed, under differentiability assumptions our importance sampling method eliminates variability due to the linear part of the payoff function, and stratification eliminates much of the variability due to the quadratic part of the payoff. The two parts of the method are linked because the asymptotically optimal drift vector frequently provides a particularly effective direction for stratification. We illustrate the use of the method with pathdependent options, a stochastic volatility model, and interest rate derivatives. The method reveals novel features of the structure of their payoffs. KEY WORDS: Monte Carlo methods, variance reduction, large deviations, Laplace principle 1. INTRODUCTION This paper develops a variance reduction technique for Monte Carlo simulations driven by highdimensional Gaussian vectors, with particular emphasis on the pricing of pathdependent options. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of drift is selected through a large deviations analysis and is shown to...