Results 1 - 10
of
138
Bagging Predictors
- Machine Learning
, 1996
"... Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making ..."
Abstract
-
Cited by 1998 (1 self)
- Add to MetaCart
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy. 1. Introduction A learning set of L consists of data f(y n ; x n ), n = 1; : : : ; Ng where the y's are either class labels or a numerical response. We have a procedure for using this learning set to form a predictor '(x; L) --- if the input is x we ...
Partial Constraint Satisfaction
, 1992
"... . A constraint satisfaction problem involves finding values for variables subject to constraints on which combinations of values are allowed. In some cases it may be impossible or impractical to solve these problems completely. We may seek to partially solve the problem, in particular by satisfying ..."
Abstract
-
Cited by 390 (22 self)
- Add to MetaCart
. A constraint satisfaction problem involves finding values for variables subject to constraints on which combinations of values are allowed. In some cases it may be impossible or impractical to solve these problems completely. We may seek to partially solve the problem, in particular by satisfying a maximal number of constraints. Standard backtracking and local consistency techniques for solving constraint satisfaction problems can be adapted to cope with, and take advantage of, the differences between partial and complete constraint satisfaction. Extensive experimentation on maximal satisfaction problems illuminates the relative and absolute effectiveness of these methods. A general model of partial constraint satisfaction is proposed. 1 Introduction Constraint satisfaction involves finding values for problem variables subject to constraints on acceptable combinations of values. Constraint satisfaction has wide application in artificial intelligence, in areas ranging from temporal r...
Exploratory projection pursuit
- Journal of the American Statistical Association
, 1987
"... Exploratory projection pursuit is concerned with finding relatively highly revealing lower dimensional projections of high dimensional data. The intent is to discover views of the multivariate data set that exhibit nonlinear effects-clustering, concentrations near nonlinear manifolds- that are not c ..."
Abstract
-
Cited by 206 (0 self)
- Add to MetaCart
Exploratory projection pursuit is concerned with finding relatively highly revealing lower dimensional projections of high dimensional data. The intent is to discover views of the multivariate data set that exhibit nonlinear effects-clustering, concentrations near nonlinear manifolds- that are not captured by the linear correlation structure. This paper presents a new algorithm for this purpose that has both statistical and computational advantages over previous methods. A connection to density estimation is established. Examples are presented and issues related to practical application are discussed.
Parameter Estimation Techniques: A Tutorial with Application to Conic Fitting
- Image and Vision Computing
, 1997
"... : Almost all problems in computer vision are related in one form or another to the problem of estimating parameters from noisy data. In this tutorial, we present what is probably the most commonly used techniques for parameter estimation. These include linear least-squares (pseudo-inverse and eigen ..."
Abstract
-
Cited by 153 (5 self)
- Add to MetaCart
: Almost all problems in computer vision are related in one form or another to the problem of estimating parameters from noisy data. In this tutorial, we present what is probably the most commonly used techniques for parameter estimation. These include linear least-squares (pseudo-inverse and eigen analysis); orthogonal least-squares; gradient-weighted least-squares; bias-corrected renormalization; Kalman øltering; and robust techniques (clustering, regression diagnostics, M-estimators, least median of squares). Particular attention has been devoted to discussions about the choice of appropriate minimization criteria and the robustness of the dioeerent techniques. Their application to conic øtting is described. Key-words: Parameter estimation, Least-squares, Bias correction, Kalman øltering, Robust regression (R#sum# : tsvp) Unite de recherche INRIA Sophia-Antipolis 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex (France) Telephone : (33) 93 65 77 77 -- Telecopie : (33) 9...
Network Externalities in Microcomputer Software: An Econometric Analysis of the Spreadsheet Market
- Management Science
, 1996
"... Because of network externalities, the success of a software product may depend in part on the size of its installed base and its conformance to industry standards. This research builds a hedonic model to determine the effects of network externalities, standards, intrinsic features and a time trend o ..."
Abstract
-
Cited by 115 (2 self)
- Add to MetaCart
Because of network externalities, the success of a software product may depend in part on the size of its installed base and its conformance to industry standards. This research builds a hedonic model to determine the effects of network externalities, standards, intrinsic features and a time trend on microcomputer spreadsheet software prices. When data for a sample of products during the 19871992 time period were analyzed using this model, four main results emerged: 1) Network externalities, as measured by the size of a product's installed base, significantly increased the price of spreadsheet products: a one percent increase in a product's installed base was associated with a 0.75% increase in its price. 2) Products which adhered to the dominant standard, the Lotus menu tree interface, commanded prices which were higher by an average of 46%. 3) Although nominal prices increased slightly during this time period, quality-adjusted prices declined by an average of 16% per year. 4) The hed...
Institutions Rule: The Primacy of Institutions over Geography and Integration in Economic Development
- Free University of Berlin
, 2004
"... We estimate the respective contributions of institutions, geography, and trade in determining income levels around the world, using recently developed instrumental variables for institutions and trade. Our results indicate that the quality of institutions “trumps ” everything else. Once institutions ..."
Abstract
-
Cited by 113 (10 self)
- Add to MetaCart
We estimate the respective contributions of institutions, geography, and trade in determining income levels around the world, using recently developed instrumental variables for institutions and trade. Our results indicate that the quality of institutions “trumps ” everything else. Once institutions are controlled for, conventional measures of geography have at best weak direct effects on incomes, although they have a strong indirect effect by influencing the quality of institutions. Similarly, once institutions are controlled for, trade is almost always insignificant, and often enters the income equation with the “wrong ” (i.e., negative) sign. We relate our results to recent literature, and where differences exist, trace their origins to choices on samples, specification, and instrumentation. The views expressed in this paper are the authors ’ own and not of the institutions with which they are affiliated. We thank three referees, Chad Jones, James Robinson, Will Masters, and participants at the Harvard-MIT development seminar, joint IMF-World Bank Seminar, and the Harvard econometrics workshop for their comments, Daron Acemoglu for helpful conversations, and Simon Johnson for providing us with his data. Dani Rodrik gratefully acknowledges support from the Carnegie Corporation of New York. Commerce and manufactures can seldom flourish long in any state which does not enjoy a regular administration of justice, in which the people do not feel themselves secure in the possession of their property, in which the faith of contracts is not supported by law, and in which the authority of the state is not supposed to be regularly employed in enforcing the payment of debts from all those who are able to pay. Commerce and manufactures, in short, can seldom flourish in any state in which there is not a certain degree of confidence in the justice of government.-- Adam Smith, Wealth of Nations I.
Nonparametric regression using Bayesian variable selection
- Journal of Econometrics
, 1996
"... This paper estimates an additive model semiparametrically, while automatically select-ing the significant independent variables and the app~opriatc power transformation of the dependent variable. The nonlinear variables arc modeled as regression splincs, with sig-nificant knots selected fiom a large ..."
Abstract
-
Cited by 107 (8 self)
- Add to MetaCart
This paper estimates an additive model semiparametrically, while automatically select-ing the significant independent variables and the app~opriatc power transformation of the dependent variable. The nonlinear variables arc modeled as regression splincs, with sig-nificant knots selected fiom a large number of candidate knots. The estimation is made robust by modeling the errors as a mixture of normals. A Bayesian approach is used to select the significant knots, the power transformation, and to identify oatliers using the Gibbs sampler to curry out the computation. Empirical evidence is given that the sampler works well on both simulated and real examples and that in the univariate case it compares faw)rably with a kernel-weighted local linear smoother, The variable selection algorithm in the paper is substantially fasler than previous Bayesian variable sclcclion algorithms. K('I ' word~': Additive nlodel, Pov¢¢r Iransformalio:l: Robust cslinlalion
Map-reduce for machine learning on multicore
- In Proceedings of NIPS
, 2007
"... We are at the beginning of the multicore era. Computers will have increasingly many cores (processors), but there is still no good programming framework for these architectures, and thus no simple and unified way for machine learning to take advantage of the potential speed up. In this paper, we dev ..."
Abstract
-
Cited by 83 (5 self)
- Add to MetaCart
We are at the beginning of the multicore era. Computers will have increasingly many cores (processors), but there is still no good programming framework for these architectures, and thus no simple and unified way for machine learning to take advantage of the potential speed up. In this paper, we develop a broadly applicable parallel programming method, one that is easily applied to many different learning algorithms. Our work is in distinct contrast to the tradition in machine learning of designing (often ingenious) ways to speed up a single algorithm at a time. Specifically, we show that algorithms that fit the Statistical Query model [15] can be written in a certain “summation form, ” which allows them to be easily parallelized on multicore computers. We adapt Google’s map-reduce [7] paradigm to demonstrate this parallel speed up technique on a variety of learning algorithms including locally weighted linear regression (LWLR), k-means, logistic regression
Incremental Online Learning in High Dimensions
- Neural Computation
, 2005
"... Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e ..."
Abstract
-
Cited by 67 (12 self)
- Add to MetaCart
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e#cient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, iii) adjusts its weighting kernels based only on local information in order to minimize the danger of negative interference of incremental learning, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number of - possibly redundant - inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and e#ciently operate in very high dimensional spaces.
Business Value of Information Technology: A Study of Electronic Data Interchange
, 1995
"... A great deal of controversy exists about the impact of information technology on firm performance. While some authors have reported positive impacts, others have found negative or no impacts. This study focuses on Electronic Data Interchange (EDI) technology. Many of the problems in this line of res ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
A great deal of controversy exists about the impact of information technology on firm performance. While some authors have reported positive impacts, others have found negative or no impacts. This study focuses on Electronic Data Interchange (EDI) technology. Many of the problems in this line of research are overcome in this study by conducting a careful analysis of the performance data of the past decade gathered from the assembly centers of Chrysler Corporation. This study estimates the dollar benefits of improved information exchanges between Chrysler and its suppliers that result from using EDI. After controlling for variations in operational complexity arising from mix, volume, parts complexity, model, and engineering changes, the savings per vehicle that result from improved information exchanges are estimated to be about $60. Including the additional savings from electronic document preparation and transmission, the total benefits of EDI per vehicle amount to over $100. System wide, this translates to annual savings of $220 million for the company.

