The Viterbi algorithm
 Proceedings of the IEEE
, 1973
Abstract

Good applications for crummy machine translation. Machine Translation
, 1993
"... Ideally, we might hope to improve the performance of our MT systems by improving the system, but it might be even more important to improve performance by looking for a more appropriate application. A survey of the literature on evaluation of MT systems seems to suggest that the success of the evalu ..."
Abstract

Ideally, we might hope to improve the performance of our MT systems by improving the system, but it might be even more important to improve performance by looking for a more appropriate application. A survey of the literature on evaluation of MT systems seems to suggest that the success of the evaluation often depends very strongly on the selection of an appropriate application. If the application is wellchosen, then it often becomes fairly clear how the system should be evaluated. Moreover, the evaluation is likely to make the system look good. Conversely, if the application is not clearly identified (or worse, if the application is poorly chosen), then it is often very difficult to find a satisfying evaluation paradigm. We begin our discussion with a brief review of some evaluation metrics that have been tried in the past and conclude that it is difficult to identify a satisfying evaluation paradigm that will make sense over all possible applications. It is probably wise to identify the application first, and then we will be in a much better position to address evaluation questions. The discussion will then turn to the main point, an essay on how to pick a good niche application for stateoftheart (crummy) machine translation. 21.
Design of a Linguistic Postprocessor using Variable Memory Length Markov Models
 In International Conference on Document Analysis and Recognition
, 1995
"... We present the design of a linguistic postprocessor for character recognizers. The central module of our system is a trainable variable memory length Markov model (VLMM) which predicts the next character given a variable length window of past characters. The overall system is composed of several fin ..."
Abstract

We present the design of a linguistic postprocessor for character recognizers. The central module of our system is a trainable variable memory length Markov model (VLMM) which predicts the next character given a variable length window of past characters. The overall system is composed of several finite state automata, including the main VLMM and a proper noun VLMM. The best model reported in the literature (Brown et al 1992) achieves 1.75 bits per character on the Brown corpus. On that same corpus, our model, trained on 10 times less data, reaches 2.19 bits per character and is 200 times smaller (_ 160,000 parameters). The model was designed for handwriting recognition applications but can be used for other OCR problems and speech recognition.
Predictability, Complexity, and Learning
, 2001
"... We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If t ..."
Abstract

We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then Ipred(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, powerlaw growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of Ipred(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.
OneDimensional and MultiDimensional Substring Selectivity Estimation
, 2000
"... this paper,we uw pru,C cou,CF1p fix trees (PSTs) as the basic datastruC tur forsu,3kRk, selectivity estimation. For the 1D problem, we present a novel techniqu called MO (Maximal Overlap). We then develop and analyze two 1D estimation algorithms, MOC and MOLC,based on MO and a constraintbased cha ..."
Abstract

Substring selectivity estimation
 In Proceedings of the ACM Symposium on Principles of Database Systems
, 1999
"... We study the problem of estimating selectivity of approximate substring queries. Its importance in databases is ever increasing as more and more data are input by users and are integrated with many typographical errors and different spelling conventions. To begin with, we consider edit distance for ..."
Abstract

We study the problem of estimating selectivity of approximate substring queries. Its importance in databases is ever increasing as more and more data are input by users and are integrated with many typographical errors and different spelling conventions. To begin with, we consider edit distance for the similarity between a pair of strings. Based on information stored in an extended Ngram table, we propose two estimation algorithms, MOF and LBS for the task. The latter extends the former with ideas from set hashing signatures. The experimental results show that MOF is a lightweight algorithm that gives fairly accurate estimations. However, if more space is available, LBS can give better accuracy than MOF and other baseline methods. Next, we extend the proposed solution to other similarity predicates, SQL LIKE operator and Jaccard similarity. 1.
Let Your Fingers do the Spelling: Implicit disambiguation of words spelled with the telephone keypad
, 1991
"... One way to enter words into an interactive computer system is to spell them with the letters on a telephone keypad. Although each button has three letters, the system designer can often supply the system with enough additional information that it can select the intended word without additional inp ..."
Abstract

One way to enter words into an interactive computer system is to spell them with the letters on a telephone keypad. Although each button has three letters, the system designer can often supply the system with enough additional information that it can select the intended word without additional input from the user. This is called implicit disambiguation. This paper examines the obstacles to implicit disambiguation and describes two different kinds of knowledge that can make it possible.
Information theory and learning: a physical approach
, 2000
"... We try to establish a unified information theoretic approach to learning and to explore some of its applications. First, we define predictive information as the mutual information between the past and the future of a time series, discuss its behavior as a function of the length of the series, and ex ..."
Abstract

We try to establish a unified information theoretic approach to learning and to explore some of its applications. First, we define predictive information as the mutual information between the past and the future of a time series, discuss its behavior as a function of the length of the series, and explain how other quantities of interest studied previously in learning theory—as well as in dynamical systems and statistical mechanics—emerge from this universally definable concept. We then prove that predictive information provides the unique measure for the complexity of dynamics underlying the time series and show that there are classes of models characterized by power–law growth of the predictive information that are qualitatively more complex than any of the systems that have been investigated before. Further, we investigate numerically the learning of a nonparametric probability density, which is an example of a problem with power–law complexity, and show that the proper Bayesian formulation of this problem provides for the ‘Occam ’ factors that punish overly complex models and thus allow one to learn not only a solution within a specific model class, but also the class itself using the data
Mixing, entropy and competition
 Physica Scripta
, 2012
"... Nontraditional thermodynamics, applied to random behaviour associated with turbulence, mixing and competition, is reviewed and analysed. Competitive mixing represents a general framework for the study of generic properties of competitive systems and can be used to model a wide class of nonequilibr ..."
Abstract

Nontraditional thermodynamics, applied to random behaviour associated with turbulence, mixing and competition, is reviewed and analysed. Competitive mixing represents a general framework for the study of generic properties of competitive systems and can be used to model a wide class of nonequilibrium phenomena ranging from turbulent premixed flames and invasion waves to complex competitive systems. We demonstrate consistency of the general principles of competition with thermodynamic description, review and analyse the related entropy concepts and introduce the corresponding competitive Htheorem. A competitive system can be characterized by a thermodynamic quantity—competitive potential—which determines the likely direction of evolution of the system. Contested resources tend to move between systems from lower to higher values of the competitive potential. There is, however, an important difference between conventional thermodynamics and competitive thermodynamics. While conventional thermodynamics is constrained by its zeroth law and is fundamentally transitive, the transitivity of competitive thermodynamics depends on the transitivity of the competition rules. Intransitivities are common in the real world and are responsible for complex behaviour in competitive systems. This work follows ideas and methods that have originated from the analysis of turbulent combustion, but reviews a much broader scope of issues linked to mixing and competition, including thermodynamic characterization of complex competitive systems with selforganization. The approach presented here is interdisciplinary and is addressed to the general educated readers, whereas the mathematical details can be found in the appendices.