| Carlos Domingo, Ricard Gavald, and Osamu Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Mining and Knowledge Discovery, 6(2):131--152, 2002. |
....a given quality criterion. Other recent examples of this approach include Maron and Moore s racing algorithm for model selection [13] Greiner s PALO algorithm for probabilistic hill climbing [7] Sche#er and Wrobel s sequential sampling algorithm [16] and Domingo et al. s AdaSelect algorithm [2]. Our method goes beyond these in applying to any type of discrete search, providing new formal results, working within pre specified memory limits, supporting interleaving of search steps, learning from timechanging data, etc. A related approach is progressive sampling [14, 15] where ....
C. Domingo, R. Gavalda, and O. Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Mining and Knowledge Discovery, 6:131--152, 2002.
.... of data and making the guarantee dependent on the observed empirical utility values (e.g. Freund, 1998; Langford McAllester, 2000) or demanding a certain xed quality and making the number of examples dependent on the observed utility values (Wald, 1947; Maron Moore, 1994; Greiner, 1996; Domingo et al. 1999) (this is often referred to as sequential sampling) In this paper, we generalize known sampling results in two respects. Firstly, in many cases, it is more natural for a user to ask for the n best solutions instead of the single best or all hypotheses above a threshold. Secondly, and more ....
....is near optimal (or very poor, respectively) The incremental greedy learning algorithm Palo (Greiner, 1996) has been reported to require many times fewer examples than the worst case bounds suggest. In a KDD context, similar improvements have been achieved with the sequential algorithm of (Domingo et al. 1999). 3. Generalized Problem Setting We generalize these above results in two respects. First, in many cases, it is more natural for a user to ask for the n best solutions instead of the single best or all hypotheses above a threshold. For instance, a user might nd a small number of the most ....
[Article contains additional citation context not shown here]
Domingo, C., Gavelda, R., & Watanabe, O. (1999). Adaptive sampling methods for scaling up knowledge discovery algorithms (Technical Report TR-C131). Dept. de LSI, Politecnica de Catalunya.
....respectively) The incremental greedy learning algorithm Palo [11] has been reported to require many times fewer examples than the worst case bounds suggest. In the context of knowledge discovery in databases, too, sequential sampling algorithms can reduces the required amount of data signi cantly [12, 6]. These existing sampling algorithms address discovery problems where the goal is to select from a space of possible hypotheses H one of the elements with maximal value of an instance averaging quality function f , or all elements with an f value above a user given threshold (e.g. all ....
....resulting in relatively complex evaluation functions (see, e.g. 18] for an overview) In light of the large range of existing and possible future utility functions and in order to avoid unduly restricting our algorithm, we will not make syntactic assumptions about f . In particular, unlike [6], we will not assume that f is a single probability nor that it is (a function of) an average of instance properties. Instead, we only assume that it is possible to determine a con dence interval f that bounds the possible di erence between true utility (on the whole database) and estimated ....
[Article contains additional citation context not shown here]
C. Domingo, R. Gavelda, and O. Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Technical Report TR-C131, Dept. de LSI, Politecnica de Catalunya, 1999.
....knowledge discovery in databases where often much more data are available than can be processed. A non sequential sampling algorithm for KDD has been presented by Toivonen [17] a sequential algorithm (that imposes further restrictions on f and possesses an additional parameter) by Domingo et al. [5, 6]. So far, all sampling algorithms have been restricted to instance averaging utility functions (such as error probabilities) and to nding a single approximately best hypothesis. For the subgroup discovery problem, utility functions are used that combine generality and a distributional property ....
C. Domingo, R. Gavelda, and O. Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Technical Report TR-C131, Dept. de LSI, Politecnica de Catalunya, 1999.
....a polynomial of degree 4 to a polynomial of degree 2 in worst case, a signi cant reduction. Adaptive sampling has been studied since long time ago (see, for instance, the book by Walt [9] and has also been recently used in the context of database query estimation [7] and knowledge discovery [1, 2]. Furthermore, adaptivity is a very desirable property for an algorithm that is expected to be used in practical applications. See the discussion about the relevance of adaptivity in the context of learning and discovery science in [10] As noted by the authors, a practical implementation based ....
Carlos Domingo, Ricard Gavalda and Osamu Watanabe. Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms. Tech Rep. C-131, Dept. of Math and Computing Science, Tokyo Institute of Technology. April, 1999.
....need to develop e#cient algorithms for handling huge data sets. Random sampling is one of the important algorithmic methods for processing huge data sets. In this paper, we explain some random sampling techniques for speeding up learning algorithms and making them applicable to large data sets [15, 16, 4, 3]. We also show some algorithms obtained by using these techniques. 1 Introduction For knowledge discovery, or more specifically, for a certain kind of data mining task, it would be quite helpful if we can use learning algorithms on a very large data set. In this paper, I explain some random ....
....I cannot explain examples in detail; I cannot even cite and list related papers either. Please refer to a coming book reporting our achievements in the Discovery Science Project that will be published from Springer. On the other hand, for those explained in the following sections, please refer [15, 16, 4, 3] as their original sources. # A part of this work is supported by the Ministry of Education, Science, Sports and Culture, Grant in Aid for Scientific Research on Priority Areas (Discovery Science) 1998 2001. 2 Osamu Watanabe The ultimate goal of computer science is to provide good computer ....
[Article contains additional citation context not shown here]
C. Domingo, R. Gavalda, and O. Watanabe, Adaptive sampling methods for scaling up knowledge discovery algorithms, in Proc. the 2nd Intl. Conf. on Discovery Science (DS'99), Lecture Notes in AI, 172--183, 1999. (The final version will appear in J. Knowledge Discovery and Data Mining.)
....Project, the author and his colleagues, Carlos Domingo and Ricard Gavalda, have developed simple and easy to use sampling techniques that would help us scaling up algorithms designed for machine learning and data mining. For studying our sampling techniques, see, for example, technical papers [DGW01]. Also see survey papers [Wat00a, Wat00b] for adaptive sampling techniques in general. A common feature of these sampling techniques is adaptivity . Tuning up algorithms is important for making them applicable to heavy computational tasks. It often needs, however, experts knowledge on ....
....h t that the weak learner produces has advantage approximately # t with high probability. Of course, we assume here the correctness of the weak learner. But we would like to obtain such a hypothesis without knowing # lb in advance. This can be achieved by our adaptive sampling algorithms [DGW01]. The basic idea is simple and in fact the same as before. We use doubling and cross validation . To be concrete, let us consider here the problem of estimating the advantage # # of a given weak hypothesis h (that is obtained at some boosting round) We estimate the advantage 8 of h by ....
[Article contains additional citation context not shown here]
C. Domingo, R. Gavalda, and O. Watanabe, Adaptive sampling methods for scaling up knowledge discovery algorithms, Data Mining and Knowledge Discovery (special issue edited by H. Liu and H. Motoda), 2001, to appear.
.... use it for the filtering framework using the weights as defined in Freund and Schapire (1997) Unfortunately, one can show that the probability that the filter accepts one example becomes exponentially small and thus, it might take a large time to generate a new sample at each iteration (see (Domingo and Watanabe, 1999) for a formal proof of this) The reason is that, due to its weighting scheme, after some iterations, AdaBoost concentrates most of the weight under the current distribution in very few instances (assuming that the initial distribution over the training set was uniform) This problem was address ....
....the training set was uniform) This problem was address by Watanabe (1999) where it was proposed to use the initial value of the weights as a saturation bound so that they cannot grow uncontrolled like it happens in AdaBoost and a partial theoretical justification of its properties was provided. Domingo and Watanabe (1999) further modify the weighting scheme so a formal proof of convergence could be obtained. More precisely, they proved that this modified algorithm is still a boosting algorithm in the PAC sense. We will indistinctively denote these two modifications by MadaBoost. They also proved that MadaBoost can ....
[Article contains additional citation context not shown here]
Domingo, C., Gavalda, R. and Watanabe, O. (1999). Adaptive sampling methods for scaling up knowledge discovery algorithms, in Proc. of the Second International Conference on Discovery Science, DS'99, LNAI 1721, 172--181.
....t The situation is di#erent from estimating W t 1 , and a straightforward application of a convergence bound like the Cherno# and Hoe#ding bounds does not work. Fortunately, we can make use of more sophisticated estimation methods called adaptive sampling techniques that have been proposed in [DGW98, DGW99]. An adaptive sampling technique for estimating the advantage is explicitly explained in a survey paper [Wat00] By using one of these techniques, it is possible to obtain a desired # # t from O( 1 # 2 t ) ln(1 # t ) examples randomly generated by EXD,f# . Now summarizing the above ....
C. Domingo, R. Gavalda, and O. Watanabe, Adaptive sampling methods for scaling up knowledge discovery algorithms, in Proc. of the Second International Conference on Discovery Science, DS'99, Lecture Notes in Artificial Intelligence, 1999, to appear.
....overestimated size. Secondly, since a sample is not a priori fixed, we can run the weak learner on random samples of appropriate sizes at each iteration of the boosting; in this way, we can reduce the computation time particularly when the dataset is very large and we use apprporiate sampling [DGW99] for scalingup the weak learner. Recall that AdaBoost is defined for the subsampling framework and that it is not appropriate for the filtering framework, at least not in an obvious way. Note that there are boosting algorithms, namely, the one proposed by Schapire [Sch90] and the one by Freund ....
....## t of # t The situation is different from estimating W t 1 , and a straightforward application of a convergence bound like the Hoeffding bound does not work. Fortunately, we can make use of more sophisticated estimation methods called adaptive sampling techniques that have been proposed in [DGW99, Wat00]. By using one of these techniques, it is possible to obtain a desired # # t from O( 1 # 2 t ) ln(1 # t ) examples randomly generated by EXD,f# . Now summarizing the above discussion, we have the following theorem. Theorem 5 Suppose that for given inputs # 0 and # 1, the algorithm ....
C. Domingo, R. Gavalda, and O. Watanabe, Adaptive sampling methods for scaling up knowledge discovery algorithms, in Proc. of the Second International Conference on Discovery Science, DS'99, Lecture Notes in Artificial Intelligence 1721, 172--183, 1999.
....dataset at each iteration to determine which stump we pass to the boosting algorithm we will use only a portion of it. Now, the problem is shifted to decide how much amount of data we need at each iteration. To solve this problem we will use an adaptive sampling method proposed by Domingo et.al. [3, 4] that it is particularly suitable for this problem. Adaptive sampling methods do not determine the sample size a priori. Instead, they obtain examples incrementally and decide on line depending on the current situation when to stop sampling. Adaptive sampling methods have been studied in ....
....h 2 HDS , U(h; S) kfx 2 S : h classi es x correctlygk=t 1=2; until (9h 2 HDS such that U(h; S) t ) output h0 2 HDS with largest U(h; S) output 1 U(h0 ; S) as an estimation of h0 s error prob. Fig. 2. The Decison Stump Selector SDS. we will use an algorithm proposed by Domingo et.al. in [3, 4]. The algorithm is described in Figure 2 and we discuss it in the following. The algorithm, denoted by SDS receives as input set HDS and a con dence parameter . It randomly obtains examples from dataset X usign procedure F iltEx described in Section 2. Every time a new example is obtained, it ....
[Article contains additional citation context not shown here]
Domingo, C., Gavalda, R. and Watanabe, O., 1999. Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms. To apear in Proceedings of the Second International Conference on Discovery Science.
....it can find a weak hypothesis with very high accuracy, it does not need much examples and on the other hand, when the best hypothesis in the class of hypothesis used by the algorithm has accuracy close to half, it needs more examples to find it. Algorithms with this property have been studied in [6, 7]. In this case, the weak learning algorithm can benefit of not using the same sample size all the time and choose adaptively the most suitable at every boosting step. Another advantage of using adaptive weak learners is that they are very suitable to be use in situations where large amount of data ....
....ensure that the probability of error of the algorithm is less than ffi as required. We omit here a detail explanation of how this can be done, it will be done in a future version. We would just like to point out that the algorithms for hypotheses selection and on line adaptive sampling proposed in [6, 7] seem to feet particularly well for the situation we encounter here. 5 Noise Tolerance We have argued that one of the drawbacks of AdaBoost is that it does not seem to be noise resistant. While from the theoretical side nothing is known about it, recent work of Dietterich [4] provides a reasonable ....
Carlos Domingo, Ricard Gavald`a and Osamu Watanabe. Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms. Tech Rep. C-131, Dept. of Math and Computing Science, Tokyo Institute of Technology. April, 1999.
....appropriate sample size for the current input data set. With this motivation, adaptive sampling techniques have been proposed in computer science. See Remark 1 below for statistics. Lipton etal [10, 11] proposed adaptive sampling algorithms for relational database. More recently, Domingo etal [3, 5, 4] proposed more general adaptive sampling algorithms for rule selection problems. This paper explain key ideas of these two types of adaptive sampling techniques. In this paper, we fix one simple problem and explain sampling techniques. Let us specify this problem. We consider the following simple ....
....to make a real number integral. We ask the reader to refer the original papers for details. The algorithm explained in Section 3 is from the query size estimation algorithm of Lipton etal [10] The algorithm explained in Section 4 is based on the adaptive rule selection algorithm of Domingo etal [5, 4]. 2 Batch Sampling and Statistical Bounds The simplest random sampling algorithm for estimating p B is to pick up some elements of D randomly and estimate the average value of B on these selected examples. This is our first sampling algorithm and we call it Batch Sampling (Figure 1) Even this ....
[Article contains additional citation context not shown here]
C. Domingo, R. Gavalda, and O. Watanabe, "Adaptive sampling methods for scaling up knowledge discovery algorithms (the 2nd version)," in Proceedings of the Second International Conference on Discovery Science, to appear.
....appropriate sample size for the current input data set. With this motivation, adaptive sampling techniques have been proposed in computer science. See Remark 1 below for statistics. Lipton etal [10, 11] proposed adaptive sampling algorithms for relational database. More recently, Domingo etal [3, 5, 4] proposed more general adaptive sampling algorithms for rule selection problems. This paper explain key ideas of these two types of adaptive sampling techniques. In this paper, we fix one simple problem and explain sampling techniques. Let us specify this problem. We consider the following simple ....
....to make a real number integral. We ask the reader to refer the original papers for details. The algorithm explained in Section 3 is from the query size estimation algorithm of Lipton etal [10] The algorithm explained in Section 4 is based on the adaptive rule selection algorithm of Domingo etal [5, 4]. 2 Batch Sampling and Statistical Bounds The simplest random sampling algorithm for estimating p B is to pick up some elements of D randomly and estimate the average value of B on these selected examples. This is our first sampling algorithm and we call it Batch Sampling (Figure 1) Even this ....
[Article contains additional citation context not shown here]
C. Domingo, R. Gavalda, and O. Watanabe, "Adaptive sampling methods for scaling up knowledge discovery algorithms," Research Report C-136, Dept. of Math. and Computing Sciences, Tokyo Institute of Technology, 1999. (Available at www.is.titech.ac.jp/research/research-report/C/)
....Here is one problem. In order to calculate the size of S by using any of these statistical bounds, we need to know how well the best hypothesis performs on X, but this is usually hard to know in advance. Fortunately, though, there is an adaptive way to do this sampling. Recently, we developed [DGW99] a general method to determine sample size based on the performance of hypotheses on observed examples, and by using this method, we can build an adaptive random sampling algorithm AdaSelect for hypothesis selection. We use this algorithm for our weak learner. We cannot simply combine ....
....propose an algorithm AdaSelect described in Figure 1. Here for any set of examples, we use adv S (h) to denote the advantage of h on S; that is, adv S (h) def = kf (x; b) 2 S : h(x) b gk kSk 1 2 : For this algorithm, we can prove the following properties, which are derived immediately from [DGW99, Theorem 3 and Theorem 4]. Algorithm AdaSelect Given: X, H, and EXD;f . Let N = kHk. Input: and , where 0 1 and 0 1. begin repeat Initially set t = 0 and S = t t 1; x; b) EXD;f ; S S [ f(x; b)g; a t b q ln(2Nt(t 1) 2t) until 9h 2 H [adv S (h) a t (2= 1) output h 2 H ....
[Article contains additional citation context not shown here]
C. Domingo, R. Gavalda, and O. Watanabe, Adaptive sampling methods for scaling up knowledge discovery algorithms, Technical Report C-131, Dept. of Math. and Computing Sciences, Tokyo Institute of Technology, 1999.
....it can nd a weak hypothesis with very high accuracy, it does not need much examples and on the other hand, when the best hypothesis in the class of hypothesis used by the algorithm has accuracy close to half, it needs more examples to nd it. Algorithms with this property have been studied in [6, 7]. In this case, the weak learning algorithm can bene t of not using the same sample size all the time and choose adaptively the most suitable at every boosting step. Another advantage of using adaptive weak learners is that they are very suitable to be use in situations where large amount of data ....
....ensure that the probability of error of the algorithm is less than as required. We omit here a detail explanation of how this can be done, it will be done in a future version. We would just like to point out that the algorithms for hypotheses selection and on line adaptive sampling proposed in [6, 7] seem to feet particularly well for the situation we encounter here. 5 Noise Tolerance We have argued that one of the drawbacks of AdaBoost is that it does not seem to be noise resistant. While from the theoretical side nothing is known about it, recent work of Dietterich [4] provides a reasonable ....
Carlos Domingo, Ricard Gavalda and Osamu Watanabe. Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms. Tech Rep. C-131, Dept. of Math and Computing Science, Tokyo Institute of Technology. April, 1999.
No context found.
Carlos Domingo, Ricard Gavald, and Osamu Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Mining and Knowledge Discovery, 6(2):131--152, 2002.
No context found.
C. Domingo, R. Gavelda, and O. Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Technical Report TR-C131, Dept. de LSI, Politecnica de Catalunya, 1999.
No context found.
C. Domingo, R. Gavelda, and O. Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Technical Report TR-C131, Dept. de LSI, Politecnica de Catalunya, 1999.
No context found.
C. Domingo, R. Gavelda, and O. Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Technical Report TR-C131, Dept. de LSI, Politecnica de Catalunya, 1999.
No context found.
C. Domingo, R. Gavelda, and O. Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Technical Report TR-C131, Dept. de LSI, Politecnica de Catalunya, 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC