Results 1 - 10
of
94,233
Supervised and unsupervised discretization of continuous features
- in A. Prieditis & S. Russell, eds, Machine Learning: Proceedings of the Twelfth International Conference
, 1995
"... Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify de n-ing characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised dis ..."
Abstract
-
Cited by 534 (11 self)
- Add to MetaCart
discretization method, to entropy-based and purity-based methods, which are supervised algorithms. We found that the performance of the Naive-Bayes algorithm signi cantly improved when features were discretized using an entropy-based method. In fact, over the 16 tested datasets, the discretized version of Naive
Information Technology, Workplace Organization, and the Demand for Skilled Labor: Firm-Level Evidence
- Journal of Economics
, 2002
"... We investigate the hypothesis that the combination of three related innovations—1) information technology (IT), 2) complementary workplace reorganization, and 3) new products and services — constitute a signi�cant skill-biased technical change affecting labor demand in the United States. Using detai ..."
Abstract
-
Cited by 589 (15 self)
- Add to MetaCart
We investigate the hypothesis that the combination of three related innovations—1) information technology (IT), 2) complementary workplace reorganization, and 3) new products and services — constitute a signi�cant skill-biased technical change affecting labor demand in the United States. Using
The separation of ownership and control in East Asian Corporations
- Journal of Financial Economics
, 2000
"... We examine the separation of ownership and control for 2,980 corporations in nine East Asian countries. In all countries, voting rights frequently exceed cash-#ow rights via pyramid structures and cross-holdings. The separation of ownership and control is most pronounced among family-controlled &quo ..."
Abstract
-
Cited by 573 (17 self)
- Add to MetaCart
dispersed over time. Finally, signi"cant corporate wealth in East Asia is concentrated among a few families. � 2000 Elsevier Science S.A. All rights reserved. JEL classixcation: G32; L22
Combining labeled and unlabeled data with co-training
, 1998
"... We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated by the ta ..."
Abstract
-
Cited by 1614 (34 self)
- Add to MetaCart
provide empirical results on real web-page data indicating that this use of unlabeled examples can lead to signi cant improvement of hypotheses in practice. As part of our analysis, we provide new re-
Effects with Random Assignment: Results for Dartmouth Roommates
, 2001
"... This paper uses a unique data set to measure peer effects among college roommates. Freshman year roommates and dormmates are randomly assigned at Dartmouth College. I find that peers have an impact on grade point average and on decisions to join social groups such as fraternities. Residential peer e ..."
Abstract
-
Cited by 523 (6 self)
- Add to MetaCart
This paper uses a unique data set to measure peer effects among college roommates. Freshman year roommates and dormmates are randomly assigned at Dartmouth College. I find that peers have an impact on grade point average and on decisions to join social groups such as fraternities. Residential peer effects are markedly absent in other major life decisions such as choice of college major. Peer effects in GPA occur at the individual room level, whereas peer effects in fraternity membership occur both at the room level and the entire dorm level. Overall, the data provide strong evidence for the existence of peer effects in student outcomes.
Liquidity Risk and Expected Stock Returns
, 2002
"... This study investigates whether market-wide liquidity is a state variable important for asset pricing. We find that expected stock returns are related cross-sectionally to the sensitivities of returns to fluctuations in aggregate liquidity. Our monthly liquidity measure, an average of individual-sto ..."
Abstract
-
Cited by 590 (4 self)
- Add to MetaCart
This study investigates whether market-wide liquidity is a state variable important for asset pricing. We find that expected stock returns are related cross-sectionally to the sensitivities of returns to fluctuations in aggregate liquidity. Our monthly liquidity measure, an average of individual-stock measures estimated with daily data, relies on the principle that order flow induces greater return reversals when liquidity is lower. Over a 34-year period, the average return on stocks with high sensitivities to liquidity exceeds that for stocks with low sensitivities by 7.5 % annually, adjusted for exposures to the market return as well as size, value, and momentum factors.
Wrappers for Feature Subset Selection
- AIJ SPECIAL ISSUE ON RELEVANCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract
-
Cited by 1522 (3 self)
- Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach andshow a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes.
Solving multiclass learning problems via error-correcting output codes
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1995
"... Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass l ..."
Abstract
-
Cited by 730 (8 self)
- Add to MetaCart
Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which error-correcting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of over tting avoidance techniques such as decision-tree pruning. Finally,we show that|like the other methods|the error-correcting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.
Fast Effective Rule Induction
, 1995
"... Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error r ..."
Abstract
-
Cited by 1257 (21 self)
- Add to MetaCart
Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error rates higher than those of C4.5 and C4.5rules. We then propose a number of modifications resulting in an algorithm RIPPERk that is very competitive with C4.5rules with respect to error rates, but much more efficient on large samples. RIPPERk obtains error rates lower than or equivalent to C4.5rules on 22 of 37 benchmark problems, scales nearly linearly with the number of training examples, and can efficiently process noisy datasets containing hundreds of thousands of examples.
Improved algorithms for optimal winner determination in combinatorial auctions and generalizations
, 2000
"... Combinatorial auctions can be used to reach efficient resource and task allocations in multiagent systems where the items are complementary. Determining the winners is NP-complete and inapproximable, but it was recently shown that optimal search algorithms do very well on average. This paper present ..."
Abstract
-
Cited by 598 (55 self)
- Add to MetaCart
Combinatorial auctions can be used to reach efficient resource and task allocations in multiagent systems where the items are complementary. Determining the winners is NP-complete and inapproximable, but it was recently shown that optimal search algorithms do very well on average. This paper presents a more sophisticated search algorithm for optimal (and anytime) winner determination, including structural improvements that reduce search tree size, faster data structures, and optimizations at search nodes based on driving toward, identifying and solving tractable special cases. We also uncover a more general tractable special case, and design algorithms for solving it as well as for solving known tractable special cases substantially faster. We generalize combinatorial auctions to multiple units of each item, to reserve prices on singletons as well as combinations, and to combinatorial exchanges -- all allowing for substitutability. Finally, we present algorithms for determining the winners in these generalizations.
Results 1 - 10
of
94,233