Results 1  10
of
2,189
Random Early Detection Gateways for Congestion Avoidance
 IEEE/ACM TRANSACTIONS ON NETWORKING
, 1993
"... This paper presents Random Early Detection (RED) gateways for congestion avoidance in packetswitched networks. The gateway detects incipient congestion by computing the average queue size. The gateway could notify connections of congestion either by dropping packets arriving at the gateway or by ..."
Abstract

Cited by 2699 (31 self)
 Add to MetaCart
(Show Context)
This paper presents Random Early Detection (RED) gateways for congestion avoidance in packetswitched networks. The gateway detects incipient congestion by computing the average queue size. The gateway could notify connections of congestion either by dropping packets arriving at the gateway or by setting a bit in packet headers. When the average queue size exceeds a preset threshold,the gateway drops or marks each arriving packet with a certain probability, where the exact probability is a function of the average queue size. RED gateways keep the average queue size low while allowing occasional bursts of packets in the queue. During congestion, the probability that the gateway notifies a particular connection to reduce its window is roughly proportional to that connection's share of the bandwidth throughthe gateway. RED gateways are designed to accompany a transportlayer congestion control protocol such as TCP.The RED gateway has no bias against bursty traffic and avoids the global synchronization of many connectionsdecreasing their window at the same time. Simulations of a TCP/IP network are used to illustrate the performance of RED gateways.
Wrappers for Feature Subset Selection
 AIJ SPECIAL ISSUE ON RELEVANCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract

Cited by 1522 (3 self)
 Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach andshow a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and NaiveBayes.
The strength of weak learnability
 Machine Learning
, 1990
"... Abstract. This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distributionfree (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with h ..."
Abstract

Cited by 861 (24 self)
 Add to MetaCart
(Show Context)
Abstract. This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distributionfree (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent. A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error e.
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2000
"... We present a unifying framework for studying the solution of multiclass categorization problems by reducing them to multiple binary problems that are then solved using a marginbased binary learning algorithm. The proposed framework unifies some of the most popular approaches in which each class ..."
Abstract

Cited by 560 (20 self)
 Add to MetaCart
We present a unifying framework for studying the solution of multiclass categorization problems by reducing them to multiple binary problems that are then solved using a marginbased binary learning algorithm. The proposed framework unifies some of the most popular approaches in which each class is compared against all others, or in which all pairs of classes are compared to each other, or in which output codes with errorcorrecting properties are used. We propose a general method for combining the classifiers generated on the binary problems, and we prove a general empirical multiclass loss bound given the empirical loss of the individual binary learning algorithms. The scheme and the corresponding bounds apply to many popular classification learning algorithms including supportvector machines, AdaBoost, regression, logistic regression and decisiontree algorithms. We also give a multiclass generalization error analysis for general output codes with AdaBoost as the binary learner. Experimental results with SVM and AdaBoost show that our scheme provides a viable alternative to the most commonly used multiclass algorithms.
Mining highspeed data streams
, 2000
"... Categories and Subject ���������� � �¨�������������������������¦���¦����������¡¤�� ¡ � ¡����������������¦¡¤����§�£���� ..."
Abstract

Cited by 391 (10 self)
 Add to MetaCart
(Show Context)
Categories and Subject ���������� � �¨�������������������������¦���¦����������¡¤�� ¡ � ¡����������������¦¡¤����§�£����
Concentration Of Measure And Isoperimetric Inequalities In Product Spaces
, 1995
"... . The concentration of measure phenomenon in product spaces roughly states that, if a set A in a product# N of probability spaces has measure at least one half, "most" of the points of# N are "close" to A. We proceed to a systematic exploration of this phenomenon. The meaning ..."
Abstract

Cited by 383 (4 self)
 Add to MetaCart
. The concentration of measure phenomenon in product spaces roughly states that, if a set A in a product# N of probability spaces has measure at least one half, "most" of the points of# N are "close" to A. We proceed to a systematic exploration of this phenomenon. The meaning of the word "most" is made rigorous by isoperimetrictype inequalities that bound the measure of the exceptional sets. The meaning of the work "close" is defined in three main ways, each of them giving rise to related, but di#erent inequalities. The inequalities are all proved through a common scheme of proof. Remarkably, this simple approach not only yields qualitatively optimal results, but, in many cases, captures near optimal numerical constants. A large number of applications are given, in particular to Percolation, Geometric Probability, Probability in Banach Spaces, to demonstrate in concrete situations the extremely wide range of application of the abstract tools. AMS Classification numbers: Primary 60E15, 28A35, 60G99; Secondary 60G15, 68C15. Typeset by A M ST E X 1 2 M. TALAGRAND Table of Contents I.
Online Aggregation
, 1997
"... Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other area ..."
Abstract

Cited by 378 (44 self)
 Add to MetaCart
(Show Context)
Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other areas of computing. In this paper we propose a new online aggregation interface that permits users to both observe the progress of their aggregation queries and control execution on the fly. After outlining usability and performance requirements for a system supporting online aggregation, we present a suite of techniques that extend a database system to meet these requirements. These include methods for returning the output in random order, for providing control over the relative rate at which different aggregates are computed, and for computing running confidence intervals. Finally, we report on an initial implementation of online aggregation in postgres. 1 Introduction Aggregation is an incre...
Mining timechanging data streams
 IN PROC. OF THE 2001 ACM SIGKDD INTL. CONF. ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2001
"... Most statistical and machinelearning algorithms assume that the data is a random sample drawn from a stationary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying processes genera ..."
Abstract

Cited by 319 (5 self)
 Add to MetaCart
Most statistical and machinelearning algorithms assume that the data is a random sample drawn from a stationary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying processes generating them changed during this time, sometimes radically. Although a number of algorithms have been proposed for learning timechanging concepts, they generally do not scale well to very large databases. In this paper we propose an efficient algorithm for mining decision trees from continuouslychanging data streams, based on the ultrafast VFDT decision tree learner. This algorithm, called CVFDT, stays current while making the most of old data by growing an alternative subtree whenever an old one becomes questionable, and replacing the old with the new when the new becomes more accurate. CVFDT learns a model which is similar in accuracy to the one that would be learned by reapplying VFDT to a moving window of examples every time a new example arrives, but with O(1) complexity per example, as opposed to O(w), where w is the size of the window. Experiments on a set of large timechanging data streams demonstrate the utility of this approach.
Latent semantic indexing: A probabilistic analysis
, 1998
"... Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the termdocument matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underl ..."
Abstract

Cited by 315 (7 self)
 Add to MetaCart
Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the termdocument matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We also propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging experimental results. We also argue that our results may be viewed in a more general framework, as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative filtering.
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
, 2003
"... Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some ..."
Abstract

Cited by 288 (4 self)
 Add to MetaCart
Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of in nitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.