| Laird, P. D., (1988), Learning from Good and Bad Data, Norwell MA: Kluwer Academic Publishers. |
....m would solve the problem of learning DNF. Observe that the class considered by Frazier et al. is a generalization of the class of DNF formulas in which all variables only appear negated. While there has been some work addressing the general issue of mislabelled training examples in the PAC model [1, 18, 28, 17], there has been little research on learning geometric concepts with noise. Auer [3] investigates exact learning of boxes where some of the counterexamples, given in response to equivalence queries, are noisy. Auer shows that box n is learnable using hypotheses from box n if and only if the ....
Philip D. Laird. Learning from Good and Bad Data. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.
....a nonempty version space is generated or enough time has passed to believe that the approach is not computationally feasible on the given data with the given concept description language. 6 Theoretical Results Recent theoretical work on concept learning (e.g. Valiant, 1984; Haussler, 1988; Laird, 1988] has developed techniques for analyzing how the quality of results is influenced by the amount of data used in learning. This section gives two such results for learning from data with bounded inconsistency. In Section 6.2, we analyze the sample complexity of the algorithm presented in this ....
....1 Gamma p of having the value failure . If 0 p 1, 0 s 1, and m is any positive integer, then Prob[at most b(p Gamma s)mc successes in m trials] e Prob[at least d(p s)me successes in m trials] e Our analysis of sample complexity closely parallel s Theorem 5. 3 of Laird s thesis [Laird, 1988]. Two propositions are required. Proposition 1 o error(t) j. Proof: By definition, t always agrees with an example of the form hx; class(x)i. So t will agree with the oracle s example hP (x) class(x)i if and only if 19 class(P (x) class(x) and disagree if and only if class(P (x) 6= ....
[Article contains additional citation context not shown here]
P. Laird. Learning from Good and Bad Data. Kluwer Academic Publishers, 1988.
....In fact, most of the standard PAC learning algorithms would fail if even a small number of the labelled examples given to the learning algorithm were noisy. Two popular noise models for both theoretical and experimental research are the classification noise model introduced by Angluin and Laird [2, 21] and the malicious error model introduced by Valiant [35] and further studied by Kearns and Li [20] In the classification noise model, each example received by the learner is mislabelled randomly and independently with some fixed probability. In the malicious error model, an adversary is allowed, ....
....The Classification Noise and Malicious Error Models One criticism of the PAC model is that the data presented to the learner is required to be noise free. Two popular models of noise for both experimental and theoretical purposes are the classification noise model introduced by Angluin and Laird [2, 21] and the malicious error model introduced by Valiant [35] The Classification Noise Model In the classification noise model, the example oracle EX(f, D) is replaced by a noisy example oracle EXcN(f, D) Each time this noisy example oracle is called, an instance x X is drawn according to D. The ....
[Article contains additional citation context not shown here]
Philip D. Laird. Learning from Good and Bad Data. Kluwer international series in engi- neering and computer science. Kluwer Academic Publishers, Boston, 1988.
....In fact, most of the standard PAC learning algorithms would fail if even a small number of the labelled examples given to the learning algorithm were noisy. Two popular noise models for both theoretical and experimental research are the classification noise model introduced by Angluin and Laird [2, 21] and the malicious error model introduced by Valiant [35] and further studied by Kearns and Li [20] In the classification noise model, each example received by the learner is mislabelled randomly and independently with some fixed probability. In the malicious error model, an adversary is allowed, ....
....The Classification Noise and Malicious Error Models One criticism of the PAC model is that the data presented to the learner is required to be noise free. Two popular models of noise for both experimental and theoretical purposes are the classification noise model introduced by Angluin and Laird [2, 21] and the malicious error model introduced by Valiant [35] The Classification Noise Model In the classification noise model, the example oracle EX (f; D) is replaced by a noisy example CN (f; D) Each time this noisy example oracle is called, an instance x 2 X is drawn according to D. The ....
[Article contains additional citation context not shown here]
Philip D. Laird. Learning from Good and Bad Data. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.
....DNF formulas [13, 20] etc. For all of these classes it is known (using information theoretic arguments) that neither membership queries nor equivalence queries alone suffice. Although different from the goals of our work, there has been work on learning when the examples may be mislabeled [3, 32, 41, 29] and when there is attribute noise [40, 23, 33] There has also been some work in which the answers to membership queries are noisy or missing [37, 7, 42, 6, 12] Although we can get a response from our membership oracle, this response is not adversarially generated to model an inconclusive ....
P.D. Laird. Learning from Good and Bad Data. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.
....3.10 one first constructs bins which then play the role of the points. 5 Summary Table 1 shows the known results on learning in the malicious and classification noise models. This is a noise model where independently for every example the lable is inverted with probability j 1=2, see e.g. [7] for a survey. on the There are still a few problems open. One is the question whether the strong adversary in the lower bound proofs of Theorems 3.9 and 4.5 can be replaced by the weaker KL adversary. Also it would be interesting to see whether the constant 7=6 in Theorem 4.2 can improved to to ....
Philip D. Laird. Learning from good and bad data. In Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.
....the function f is derived from some function in the class C by adding noise to it. Typical works in this direction either tolerate only small amounts of noise [2, 42, 21, 39] i.e. that the function is modified only at a small fraction of all possible inputs) or assume that the noise is random [1, 26, 20, 25, 33, 13, 36] (i.e. that the decision of whether or not to modify the function at any given input is made by a random process) In contrast, we take the setting to an extreme, by considering a very large amount of (possibly adversarially chosen) noise. In particular, we consider situations in which the noise ....
Phillip D. Laird. Learning From Good and Bad Data. Kluwer Academic Publishers, Boston, 36 O. GOLDREICH, R. RUBINFELD, and M. SUDAN 1988.
....required as a function of parameters of the query. Finally, we give a technique for hypothesis testing (required to determine which noise rate guess was correct) which uses fewer examples than the standard technique. This improved hypothesis testing is achieved by generalizing a result of Laird [16]. We begin by giving a new decomposition of P into quantities that may be guessed or estimated using the classification noise oracle. Let , Phi and j be the standard Boolean operators for AND, exclusive OR and equivalence, respectively. For any query , we define the following four queries: ....
....the task of finding this good hypothesis among those produced. Assume that we have run our SQ algorithms with an accuracy parameter 0 = 2. Then we can find some good hypothesis through the use of Theorem 8 below. Theorem 8 is based on Theorem 7, a generalization of a theorem due to Laird [16]. Implicit in Laird s Theorem 5.32, there are two important parameters which determine the number of labelled examples sufficient to perform hypothesis testing between two hypotheses using any type of (possibly noisy) example oracle EX (f; D) These parameters are (1) t the probability of ....
[Article contains additional citation context not shown here]
Philip D. Laird. Learning from Good and Bad Data. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.
.... = Pr x2D [w(x; y) a; b) for every (a; b) 2 f0; 1g 2 , and denote by p (a;b) the probability of drawing a noisy observation (a; b) Then p (a;b) 1 Gamma j)p (a;b) jp (a; b) 2) p (a; b) 1 Gamma j)p (a; b) jp (a;b) 3) 33 It has been noticed by several authors (e.g. [22, 4, 27]) that, once j is known, one may solve the above equations for p (a;b) yielding p (a;b) 1 Gamma j) p (a;b) Gamma j p (a; b) 1 Gamma 2j (4) Thus, B can compute a good estimate of p (a;b) from good estimates of p (a;b) and p (a; b) It remains to prove the efficiency of B. ....
Philip D. Laird. Learning from good and bad data. Technical Report YALEU/DCS/TR-551, Yale University, 1987. Ph.d. Dissertation.
....the function f is derived from some function in the class C by adding noise to it. Typical works in this direction either tolerate only small amounts of noise [2, 41, 21, 39] i.e. that the function is modi ed only at a small fraction of all possible inputs) or assume that the noise is random [1, 26, 20, 25, 33, 13, 36] (i.e. that the decision of whether or not to modify the function at any given input is made by a random process) In contrast, we take the setting to an extreme, by considering a very large amount of (possibly adversarially chosen) noise. In particular, we consider situations in which the noise ....
Phillip D. Laird. Learning From Good and Bad Data. Kluwer Academic Publishers, Boston, 1988.
....u literals. The next three items have to do with elementary substitutions w.r.t. existential variables and adding e literals. The last item has to do with elementary eu substitutions. Note that there are only a finite number of non alphabetical variants in ae( Hence ae is locally finite (see [Laird 88, NW97] 1. For an elementary u unification = fy=xg, where x# y 2 uVar( let 2 ae( 2. For x 2 uVar( and an elementary u substitution = fx=f(y 1 #: #y k )g, let 2 ae( 3. Let L = p(y 1 #: #y k ) or :p(y 1 #: #y k ) where y 1 #: #y k are new distinct variables w.r.t. ....
....new constantnotinOE and remove all the existential variables from Q(OE) Let the new PCNF be . Then the variables in are universally quantified. Let M( C 1 C 2 : Cm . Then 2 subsumes C i (2 C i ) for every i. 2. Similar to a result about the classic refinement refinement operator [S81, Laird 88, LN94, NW97] we can prove that there is a chain from 2 to every C i . The combination of these chains will giveachain from to . 3. By using the elementary e substitutions (item 4 of ae) we can change the constant occurrences in back to existential variables. This establishes a ....
P. D. Laird, Learning from Good and Bad Data, Kluwer Academic Publishers, Boston, MA, 1988.
....algorithms, instead of using a single example for updating, a weighted average of a sample of misclassified examples and a correction vector is used to update the current weight vector. The correction vector is simply the mean of the normalized examples. Previous work by Angluin and Laird [1, 8] demonstrated polynomial learnability for 1 out of k and k out of k functions in the presence of classification noise. This paper improves on their result by extending polynomial learnability to a much wider concept class. Angluin and Laird s algorithm uses O(n 4 ffl Gamma3 ln(n=ffi) ....
.... (assuming k = O(n) If w is a Boolean threshold function, then oe is Omega Gamma = p n) assuming there is no need to worry about the average distance from any hyperplane) and so O(L) O(n 2 ffl Gamma2 ) and O(M) O(n 4 ffl Gamma2 ln(n= ffiffl) Angluin and Laird s algorithm [1, 8] for learning 1 and k out of k functions from n attributes uses O(n 4 ffl Gamma3 ln(n=ffi) examples but only one iteration, so the new bound is worse by O( n 2 =ffl) ln(1=ffl) but for the more general class of Boolean threshold functions. Another advantage of Angluin and Laird s ....
P. D. Laird. Learning from Good and Bad Data. Kluwer, Norwell, Massachusetts, 1988.
....the PAC and exact models, both with and without membership queries, assumes that examples are labeled either positive or negative. In these situations the border between the positive and negative examples is well defined. There has been work addressing the issue of mislabeled training examples [AL88, Lai88, Slo88, SV88, SS92, Kea93, KL93, GS95, RR95]. In these situations, the border between the positive and negative examples may appear blurry to the learner, but this is just the result of the noise process that has been applied to the properly labeled example. There has also been some work considering learning from noisy membership queries ....
P. Laird. Learning from good and bad data. In Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.
....logic, bisimulation, process calculus, process algebra 1 1 Introduction The theoretical study of the inductive inference was started by the identification of sequential machines by Moore in l950 s. It then developed into the theories of identification for systems and language grammars [1, 5, 8, 12, 15] The studies of process calculi started from the latter half of 1970 s to give mathematical semantics for concurrent processes. Typical systems are CSP by Hoare [4] and CCS by Milner [10] In Feb. 1990, ISO adopted LOTOS [2] as the international standard for OSI specification description language. ....
Laird, P.D., Learning from Good and Bad Data, Kluwer Academic Pub. 1988.
....m would solve the problem of learning DNF. Observe that the class considered by Frazier et al. is a generalization of the class of DNF formulas in which all variables only appear negated. While there has been some work addressing the general issue of mislabeled training examples in the PAC model [3, 26, 36, 25], there has been little research on learning geometric concepts with noise. Auer [6] investigates exact learning of boxes where some of the counterexamples, given in response to equivalence queries, are noisy. Auer shows that box d n is learnable using hypotheses from box d n if and only if ....
Philip D. Laird. Learning from Good and Bad Data. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.
....from 2 to q. Also note that this sentence is a Horn sentence, and ae 0 is neither a complete refinement for reduced Horn sentences. 5 A most general refinement operator for general sentences. In the previous section we have shown that Shapiro s refinement operator ae 0 is not complete. Laird [2] has defined a refinement operator for general first order sentences. As is shown before, the incompleteness of ae 0 is merely a result of not accepting refinements that are not reduced. A solution for defining a complete most general refinement operator can be dropping the condition of ....
....define his refinement operator. Instead of sentences A B where A and B are sets of atoms, Laird considers clauses of a language LL where repetition of literals is allowed. Therefore substitutions are never decreasing. Also, refinements of) clauses are not required to be reduced. Definition 1[2] Let p = A B be a clause in the language LL . Then q 2 ae L (p) when exactly one of the following holds: 1. q = p , where = fx=yg and both variables x and y occur in p. 2. q = p , where = fx=tg and t is a most general term. 3. q = A P B, where P is a most general atom. 4. q = A B P , ....
P.D. Laird. Learning from Good and Bad Data. Kluwer Academic Publishers, 1988.
....the incompleteness of ae 0 with examples. Also, we discuss the complexity measure rsize and its shortcomings when related to subsumption. In Sect. 4 we define a new refinement operator and a new complexity measure. In Sect. 5 we compare our new refinement operator with another refinement operator [2] that is complete for first order sentences. In Sect. 6 we look at refinements in a wider framework. We relate refinement operators to their duals, generalization operators, and these generalization operators to inverse resolution and model inference. These relations will also be a subject for ....
....above, are a motivation for adopting it as a complexity measure to restrict the search space of refinements. Using these new definitions, ae r is a refinement operator and it behaves like Shapiro thought ae 0 would do, it is complete for reduced sentences. 5 Comparison with Lairds ae L Laird [2] has also defined a refinement operator, ae L . He uses a different notation to define his refinement operator. Instead of sentences C D where C and D are sets of atoms, Laird considers clauses of a language LL where repetition of literals is allowed, substitutions are never decreasing, and also ....
[Article contains additional citation context not shown here]
P.D. Laird. Learning from Good and Bad Data. Kluwer Academic Publishers, 1988.
....mO(e(1 2h) 2 ) logl 2loglogl that for n l large enough, implies the thesis. Remark 1 We note that, as far as n, e and h are concerned, this lower bound essentially matches the upper bound for this class based on e covering found in [5] with the improvements suggested by Laird [15]. Indeed, an e cover for C is the one made up of all monotone monomials of at most log 1 e literals, and its cardinality is essentially of the same order of magnitude of n l (at least for e = 1 poly(n) Class C of parity functions on X = 0,1 n is the class of ....
P.D. LAIRD, Learning from good and bad data (Kluwer Academic Publisher, Boston, 1988).
....noise. This extension was first examined in the learning theory literature by Angluin and Laird [1] who formalized the simplest type of white label noise and then sought algorithms tolerating the highest possible rate of noise. In addition to being the subject of a number of theoretical studies [1, 15, 24, 11], the classification noise model has become a common paradigm for experimental machine learning research. Angluin and Laird provided an algorithm for learning boolean conjunctions that tolerates a noise rate approaching the information theoretic barrier of 1=2. Subsequently, there have been some ....
....1 Gamma ffi satisfies error(h) ffl. This probability is taken over the random draws from D made by EX (f; D) and any internal randomization of L. We call ffl the accuracy parameter and ffi the confidence parameter. 3 The Classification Noise Model The well studied classification noise model [1, 15, 11, 24, 14, 20, 22] is an extension of the Valiant model intended to capture the simplest type of white noise in the labels seen by the learner. We introduce a parameter 0 j 1=2 called the noise rate, and replace the oracle EX (f; D) with the faulty oracle EX j CN (f; D) where the subscript is the acronym for ....
[Article contains additional citation context not shown here]
Philip D. Laird. Learning from Good and Bad Data. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.
....required as a function of parameters of the query. Finally, we give a technique for hypothesis testing (required to determine which noise rate guess was correct) which uses fewer example than the standard technique. This improved hypothesis testing is achieved by generalizing a result of Laird [14]. We begin be giving a new decomposition of P into quantities that may be guessed or estimated using the classification noise oracle. Let , Phi and j be the standard binary operators for AND, exclusive OR and equivalence, respectively. For any query , we define the following four queries: ....
....hypothesis. We are now left with the task of finding this good hypothesis among those produced. Assume that we have run our SQ algorithms with an accuracy parameter 0 = 2. Then we can find some good hypothesis through the use of the following generalization of a theorem due to Laird [14]. We give the proof of this lemma in the full paper. Lemma 6 Let H be a set of hypotheses at least one of which has true error rate at most e 1 , and let e 2 be any threshold strictly larger than e 1 . Then the hypothesis in H which minimizes disagreements with respect to a sample of size O ....
Philip D. Laird. Learning from Good and Bad Data. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.
....as an approach to answering the above question. ffl The work reported in this paper assumes noise free training data. In practice, however, training examples are often subject to various kinds of noise affecting the values of the features and or the classification of the training examples [13, 15]. A direct way to deal with noise is to modify the given algorithms by relaxing the requirement of covering all the conflicts generated from the training data. That is, we search for a small set of features that may leave a certain percentage of the conflicts uncovered, where such percentage can ....
P. Laird, Learning from Good and Bad Data (Klawer Academic, Boston, Massachusetts, 1988).
No context found.
Laird, P. D., (1988), Learning from Good and Bad Data, Norwell MA: Kluwer Academic Publishers.
No context found.
P. D. Laird. Learning from Good and Bad Data. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.
No context found.
Philip D. Laird. Learning from Good and Bad Data. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.
No context found.
Phillip D. Laird. Learning From Good and Bad Data. Kluwer Academic Publishers, Boston, 1988.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC