Results 1  10
of
95
Property Testing and its connection to Learning and Approximation
"... We study the question of determining whether an unknown function has a particular property or is fflfar from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the fun ..."
Abstract

Cited by 498 (68 self)
 Add to MetaCart
We study the question of determining whether an unknown function has a particular property or is fflfar from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the function on instances of its choice. First, we establish some connections between property testing and problems in learning theory. Next, we focus on testing graph properties, and devise algorithms to test whether a graph has properties such as being kcolorable or having a aeclique (clique of density ae w.r.t the vertex set). Our graph property testing algorithms are probabilistic and make assertions which are correct with high probability, utilizing only poly(1=ffl) edgequeries into the graph, where ffl is the distance parameter. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph which corre...
Cryptographic Limitations on Learning Boolean Formulae and Finite Automata
 PROCEEDINGS OF THE TWENTYFIRST ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 1989
"... In this paper we prove the intractability of learning several classes of Boolean functions in the distributionfree model (also called the Probably Approximately Correct or PAC model) of learning from examples. These results are representation independent, in that they hold regardless of the syntact ..."
Abstract

Cited by 347 (15 self)
 Add to MetaCart
In this paper we prove the intractability of learning several classes of Boolean functions in the distributionfree model (also called the Probably Approximately Correct or PAC model) of learning from examples. These results are representation independent, in that they hold regardless of the syntactic form in which the learner chooses to represent its hypotheses. Our methods reduce the problems of cracking a number of wellknown publickey cryptosystems to the learning problems. We prove that a polynomialtime learning algorithm for Boolean formulae, deterministic finite automata or constantdepth threshold circuits would have dramatic consequences for cryptography and number theory: in particular, such an algorithm could be used to break the RSA cryptosystem, factor Blum integers (composite numbers equivalent to 3 modulo 4), and detect quadratic residues. The results hold even if the learning algorithm is only required to obtain a slight advantage in prediction over random guessing. The techniques used demonstrate an interesting duality between learning and cryptography. We also apply our results to obtain strong intractability results for approximating a generalization of graph coloring.
Principles and methods of Testing Finite State Machines  a survey
 PROCEEDINGS OF IEEE
, 1996
"... With advanced computer technology, systems are getting larger to fulfill more complicated tasks, however, they are also becoming less reliable. Consequently, testing is an indispensable part of system design and implementation; yet it has proved to be a formidable task for complex systems. This moti ..."
Abstract

Cited by 334 (14 self)
 Add to MetaCart
With advanced computer technology, systems are getting larger to fulfill more complicated tasks, however, they are also becoming less reliable. Consequently, testing is an indispensable part of system design and implementation; yet it has proved to be a formidable task for complex systems. This motivates the study of testing finite state machines to ensure the correct functioning of systems and to discover aspects of their behavior. A finite state machine contains a finite number of states and produces outputs on state transitions after receiving inputs. Finite state machines are widely used to model systems in diverse areas, including sequential circuits, certain types of programs, and, more recently, communication protocols. In a testing problem we have a machine about which we lack some information; we would like to deduce this information by providing a sequence of inputs to the machine and observing the outputs produced. Because of its practical importance and theoretical interest, the problem of testing finite state machines has been studied in different areas and at various times. The earliest published literature on this topic dates back to the 50’s. Activities in the 60’s and early 70’s were motivated mainly by automata theory and sequential circuit testing. The area seemed to have mostly died down until a few years ago when the testing problem was resurrected and is now being studied anew due to its applications to conformance testing of communication protocols. While some old problems which had been open for decades were resolved recently, new concepts and more intriguing problems from new applications emerge. We review the fundamental problems in testing finite state machines and techniques for solving these problems, tracing progress in the area from its inception to the present and the state of the art. In addition, we discuss extensions of finite state machines and some other topics related to testing.
A Fast AutomatonBased Method for Detecting Anomalous Program Behaviors
 In Proceedings of the 2001 IEEE Symposium on Security and Privacy
, 2001
"... Forrest et al introduced a new intrusion detection approach that identifies anomalous sequences of system calls executed by programs. Since their work, anomaly detection on system call sequences has become perhaps the most successful approach for detecting novel intrusions. A natural way for learnin ..."
Abstract

Cited by 223 (5 self)
 Add to MetaCart
(Show Context)
Forrest et al introduced a new intrusion detection approach that identifies anomalous sequences of system calls executed by programs. Since their work, anomaly detection on system call sequences has become perhaps the most successful approach for detecting novel intrusions. A natural way for learning sequences is to use a finitestate automaton (FSA). However, previous research seemed to indicate that FSAlearning is computationally expensive, that it cannot be completely automated, or that the space usage of the FSA may be excessive. We present a new approach in this paper that overcomes these difficulties. Our approach builds a compact FSA in a fully automatic and efficient manner, without requiring access to source code for programs. The space requirements for the FSA is low  of the order of a few kilobytes for typical programs. The FSA uses only a constant time per system call during the learning as well as detection period. This factor leads to low overheads for intrusion detection. Unlike many of the previous techniques, our FSAtechnique can capture both short term and long term temporal relationships among system calls, and thus perform more accurate detection. For instance, the FSA can capture common program structures such as branches, joins, loops etc. This enables our approach to generalize and predict future behaviors from past behaviors. For instance, if a program executed a loop once in an execution, the FSA approach can generalize and predict that the same loop may be executed zero or more times in subsequent executions. As a result, the training periods needed for our FSA based approach are shorter. Moreover, false positives are reduced without increasing the likelihood of missing attacks. This paper describes our FSA based technique and presents a ...
ModelCarrying Code: A Practical Approach for Safe Execution of Untrusted Applications
, 2003
"... This paper presents a new approach called modelcarrying code (MCC) for safe execution of untrusted code. At the heart of MCC is the idea that untrusted code comes equipped with a concise highlevel model of its securityrelevant behavior. This model helps bridge the gap between highlevel security p ..."
Abstract

Cited by 107 (14 self)
 Add to MetaCart
This paper presents a new approach called modelcarrying code (MCC) for safe execution of untrusted code. At the heart of MCC is the idea that untrusted code comes equipped with a concise highlevel model of its securityrelevant behavior. This model helps bridge the gap between highlevel security policies and lowlevel binary code, thereby enabling analyses which would otherwise be impractical. For instance, users can use a fully automated verification procedure to determine if the code satisfies their security policies. Alternatively, an automated procedure can sift through a catalog of acceptable policies to identify one that is compatible with the model. Once a suitable policy is selected, MCC guarantees that the policy will not be violated by the code. Unlike previous approaches, the MCC framework enables code producers and consumers to collaborate in order to achieve safety. Moreover, it provides support for policy selection as well as enforcement. Finally, MCC makes no assumptions regarding the inherent risks associated with untrusted code. It simply provides the tools that enable a consumer to make informed decisions about the risk that he/she is willing to tolerate so as to benefit from the functionality offered by an untrusted application.
On the Computational Complexity of Approximating Distributions by Probabilistic Automata
 Machine Learning
, 1990
"... We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the complexity of the training problem as a computational problem. The PA training problem is the problem of ap ..."
Abstract

Cited by 95 (0 self)
 Add to MetaCart
We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the complexity of the training problem as a computational problem. The PA training problem is the problem of approximating an arbitrary, unknown source distribution by distributions generated by a PA. We investigate the following question about this important, wellstudied problem: Does there exist an efficient training algorithm such that the trained PAs provably converge to a model close to an optimum one with high confidence, after only a feasibly small set of training data? We model this problem in the framework of computational learning theory and analyze the sample as well as computational complexity. We show that the number of examples required for training PAs is moderate  essentially linear in the number of transition probabilities to be trained and a lowdegree polynomial in the example l...
Inductive Inference, DFAs and Computational Complexity
 2nd Int. Workshop on Analogical and Inductive Inference (AII
, 1989
"... This paper surveys recent results concerning the inference of deterministic finite automata (DFAs). The results discussed determine the extent to which DFAs can be feasibly inferred, and highlight a number of interesting approaches in computational learning theory. 1 ..."
Abstract

Cited by 93 (1 self)
 Add to MetaCart
(Show Context)
This paper surveys recent results concerning the inference of deterministic finite automata (DFAs). The results discussed determine the extent to which DFAs can be feasibly inferred, and highlight a number of interesting approaches in computational learning theory. 1
Diversitybased Inference of Finite Automata
 Journal of ACM
, 1994
"... Abstract. We present new procedures for inferring the structure of a finitestate automaton (FSA) from its input \ output behavior, using access to the automaton to perform experiments. Our procedures use a new representation for finite automata, based on the notion of equivalence between tesfs. We ..."
Abstract

Cited by 84 (2 self)
 Add to MetaCart
Abstract. We present new procedures for inferring the structure of a finitestate automaton (FSA) from its input \ output behavior, using access to the automaton to perform experiments. Our procedures use a new representation for finite automata, based on the notion of equivalence between tesfs. We call the number of such equivalence classes the diLersL@of the automaton; the diversity may be as small as the logarithm of the number of states of the automaton. For the special class of pennatatton aatornata, we describe an inference procedure that runs in time polynomial in the diversity and log(l/6), where 8 is a given upper bound on the probability that our procedure returns an incorrect result. (Since our procedure uses randomization to perform experiments, there is a certain controllable chance that it will return an erroneous result.) We also discuss techniques for handling more general automata. We present evidence for the practical efficiency of our approach. For example, our procedure is able to infer the structure of an automaton based on Rubik’s Cube (which has approximately 10 lY states) in about 2 minutes on a DEC MicroVax. This automaton is many orders of magnitude larger than possible with previous techniques, which would require time proportional at least to the number of global states. (Note that in this example, only a small fraction (1014, of the global
What is the Search Space of the Regular Inference?
 In Proceedings of the Second International Colloquium on Grammatical Inference (ICGI'94
, 1994
"... This paper revisits the theory of regular inference, in particular by extending the definition of structural completeness of a positive sample and by demonstrating two basic theorems. This framework enables to state the regular inference problem as a search through a boolean lattice built from the p ..."
Abstract

Cited by 52 (7 self)
 Add to MetaCart
(Show Context)
This paper revisits the theory of regular inference, in particular by extending the definition of structural completeness of a positive sample and by demonstrating two basic theorems. This framework enables to state the regular inference problem as a search through a boolean lattice built from the positive sample. Several properties of the search space are studied and generalization criteria are discussed. In this framework, the concept of border set is introduced, that is the set of the most general solutions excluding a negative sample. Finally, the complexity of regular language identification from both a theoritical and a practical point of view is discussed. 1 Introduction Regular inference is the process of learning a regular language from a set of examples, consisting of a positive sample, i.e. a finite subset of a regular language. A negative sample, i.e. a finite set of strings not belonging to this language, may also be available. This problem has been studied as early as th...
Efficient Learning of Typical Finite Automata from Random Walks
, 1997
"... This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an averagecase setting to model the ``typical'' labeling of a finite automaton, while retaining a worstcase model for ..."
Abstract

Cited by 50 (9 self)
 Add to MetaCart
This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an averagecase setting to model the ``typical'' labeling of a finite automaton, while retaining a worstcase model for the underlying graph of the automaton, along with (2) a learning model in which the learner is not provided with the means to experiment with the machine, but rather must learn solely by observing the automaton's output behavior on a random input sequence. The main contribution of this paper is in presenting the first efficient algorithms for learning nontrivial classes of automata in an entirely passive learning model. We adopt an online learning model in which the learner is asked to predict the output of the next state, given the next symbol of the random input sequence; the goal of the learner is to make as few prediction mistakes as possible. Assuming the learner has a means of resetting the target machine to a fixed start state, we first present an efficient algorithm that