#### DMCA

## Local search with very large-scale neighborhoods for optimal permutations in machine translation (2006)

Venue: | In Proc. of the Workshop on Computationally Hard Problems and Joint Inference |

Citations: | 13 - 3 self |

### Citations

1555 | The mathematics of statistical machine translation: Parameter estimation.
- Pietra, Mercer
- 1993
(Show Context)
Citation Context ...core for all π. Thus, even if there exists a good model of how French reorders into English, say, there may not exist as good a model of the reverse. (This is also true of the IBM translation models (=-=Brown et al., 1993-=-).) In practice, however, we hope that (1) is flexible enough to admit reasonable models in both directions. 2.3 Relation to difficult classical problems Consider the special case of equation (1) wher... |

670 | Inducing features of random fields
- Pietra, Pietra, et al.
- 1997
(Show Context)
Citation Context ...ction, which we would need for direct optimization. Nor can we compute the expected feature counts under the distribution (13), which we would need for the improved iterative scaling algorithm (Della =-=Pietra et al., 1997-=-). However, we could compute the expectations approximately by sampling permutations as in Section 7. A second strategy is to approximate the above training criterion and use the neighborhood partitio... |

646 | Greedy randomized adaptive search procedures
- Resende, Ribeiro
- 2001
(Show Context)
Citation Context ...to produce an initial permutation that is then improved by local search. Indeed, for many problems, (randomized) greedy initialization of local search is a highly effective technique, known as GRASP (=-=Feo and Resende, 1995-=-). Another hybrid possibility is to run our local search algorithm but fall back to a beam-search decoder for all narrow spans (say, all Pijqs with j − i ≤ 8). 7 Stochastic Methods We can interpret ou... |

571 | Stochastic inversion transduction grammars and bilingual parsing of parallel corpora.
- Wu
- 1997
(Show Context)
Citation Context ...ighborhood” N (π) was independently proposed for the TSP by Deǐnenko and Woeginger (2000). It is also exactly the set of sequences π ′ that could be related to π by an Inversion Transduction Grammar (=-=Wu, 1997-=-). But unlike these authors, when scoring these sequences, we will consider arbitrary A, B, and C costs. To find the minimum-cost permutation in N (π), we construct a “parse chart” over π that builds ... |

521 | Large margin classification using the perceptron algorithm
- Freund, Schapire
- 1999
(Show Context)
Citation Context ...neighborhood, this contrastive estimation strategy (Smith and Eisner, 2005) may be a good approximation. 23 A third strategy is to train discriminatively using, for example, the perceptron algorithm (=-=Freund and Schapire, 1998-=-). We can use our iterated local search with the current model parameters, arriving 22 The training is supervised if we only use B and C costs, but unsupervised in general. The reason is that even tho... |

392 | Finite-state transducers in language and speech processing,
- Mohri
- 1997
(Show Context)
Citation Context ...ting Af ′ may contain duplicate arcs of different weights, or other arcs that cannot appear on any optimal path, it may be beneficial to reduce its size by an operation such as local determinization (=-=Mohri, 1997-=-). 3.6 Modeling translation costs Suppose we wish to construct Align as in section 3.3. A simple approach would be a singleparameter geometric distortion model. Here, Align is a bigram model that cons... |

383 | Non-projective dependency parsing using spanning tree algorithms.
- McDonald, Pereira, et al.
- 2005
(Show Context)
Citation Context ..., noting its NP-completeness (Knight, 1999; Udupa and Maji, 2006). Other recent work has similarly reduced NLP problems to previously studied formal problems (Roth and Yih, 2005; Taskar et al., 2005; =-=McDonald et al., 2005-=-; Tromble and Eisner, 2006). Those papers drew on exact solution methods. We instead draw on another rich vein of literature— heuristic strategies with some chance of search error. 14 This may unfortu... |

219 | A dynamic programming approach to sequencing problems,” - Held, Karp - 1962 |

162 | Hidden Markov model induction by Bayesian model merging. - Stolcke, Omohundro - 1993 |

160 | Contrastive estimation: Training log-linear models on unlabeled data.
- Smith, Eisner
- 2005
(Show Context)
Citation Context ...e the conditional likelihood of π ∗ given its own neighborhood N (π ∗ ) (perhaps even pruning the neighborhood for speed). For a sufficiently large neighborhood, this contrastive estimation strategy (=-=Smith and Eisner, 2005-=-) may be a good approximation. 23 A third strategy is to train discriminatively using, for example, the perceptron algorithm (Freund and Schapire, 1998). We can use our iterated local search with the ... |

150 | Decoding complexity in word-replacement translation models,
- Knight
- 1999
(Show Context)
Citation Context ...ure bigram automaton model A (the TSP) or a pure beforeness model B (the LOP). Germann et al. (2001) reduce such a problem in statistical MT to integer linear programming, noting its NP-completeness (=-=Knight, 1999-=-; Udupa and Maji, 2006). Other recent work has similarly reduced NLP problems to previously studied formal problems (Roth and Yih, 2005; Taskar et al., 2005; McDonald et al., 2005; Tromble and Eisner,... |

137 | Fast Decoding and Optimal Decoding for Machine Translation.
- Germann, Jahr, et al.
- 2001
(Show Context)
Citation Context ...put can then be translated by a simple finite-state transducer. 3. Approach: We propose seeking the optimal permutation via a “local search” strategy of iteratively improving the current permutation (=-=Germann et al., 2001-=-). This provides an alternative to the usual beam search methods. 4. Decoding algorithm: We show how this local search can consider an exponentially large set of candidate improvements at each iterati... |

100 | Parsing Inside-Out.
- Goodman
- 1998
(Show Context)
Citation Context ...exity, although it does not allow speedups like A*. The inside algorithm also produces a parse forest from which we can sample permutations of N (π) using the usual top-down parse sampling algorithm (=-=Goodman, 1998-=-, pp. 146–147). π ′ ∈ N (π) is selected with probability proportional to wf ′(π′ ) (i.e., ′(π) divided by equation (15)). wf The only wrinkle is that these methods sum and sample over the set of twist... |

98 | A discriminative matching approach to word alignment.
- Taskar, Simon, et al.
- 2005
(Show Context)
Citation Context ...er linear programming, noting its NP-completeness (Knight, 1999; Udupa and Maji, 2006). Other recent work has similarly reduced NLP problems to previously studied formal problems (Roth and Yih, 2005; =-=Taskar et al., 2005-=-; McDonald et al., 2005; Tromble and Eisner, 2006). Those papers drew on exact solution methods. We instead draw on another rich vein of literature— heuristic strategies with some chance of search err... |

95 | A* parsing: Fast exact viterbi parse selection.
- Klein, Manning
- 2003
(Show Context)
Citation Context ...obably the simplest is to discard subpaths that cost more than π. This still keeps π itself (which is in the neighborhood) as well as any π ′ that improve on π. A better “safe” speedup is A* parsing (=-=Klein and Manning, 2003-=-), which assembles constituents in a prioritized order and can stop as soon as a full parse is found. Defining the priority of a constituent Pijqs requires an admissible (i.e., optimistic) bound on it... |

90 | Integer linear programming inference for conditional random fields.
- Roth, Yih
- 2005
(Show Context)
Citation Context ...tistical MT to integer linear programming, noting its NP-completeness (Knight, 1999; Udupa and Maji, 2006). Other recent work has similarly reduced NLP problems to previously studied formal problems (=-=Roth and Yih, 2005-=-; Taskar et al., 2005; McDonald et al., 2005; Tromble and Eisner, 2006). Those papers drew on exact solution methods. We instead draw on another rich vein of literature— heuristic strategies with some... |

77 | New figures of merit for best-first probabilistic chart parsing.
- Caraballo, Charniak
- 1998
(Show Context)
Citation Context ...g can be applied directly to our algorithm. For example, we can heuristically prune constituents (e.g., for a given i, j, discard all but the lowest-cost constituents Pijqs) or use best-first search (=-=Caraballo and Charniak, 1998-=-). While these methods do risk missing the best parse, we are in the midst of a nonoptimal greedy local search anyway. Missing some parses only means that we are not searching quite as large a neighbo... |

60 | Efficient Normal-Form Parsing for Combinatory Categorial Grammars.
- Eisner
- 1996
(Show Context)
Citation Context ...ee on π containing no swap nodes will define π itself. The solution is to eliminate this spurious ambiguity in the parser, at a small constant-time overhead, by recovering only the normal-form trees (=-=Eisner, 1996-=-; Zens and Ney, 2003). Each permutation in the neighborhood is characterized by only one normal-form tree. Now, to sample from the entire space of permutations, rather than just the subset in one of o... |

58 | Learning evaluation functions to improve optimization by local search
- Boyan, Moore
- 2000
(Show Context)
Citation Context ... procedure is trainable even for a fixed model. We could adapt the parameters of the model to the search procedure (Daumé III and Marcu, 2005), adapt the objective function consulted by local search (=-=Boyan and Moore, 2000-=-), or use reinforcement learning to learn separate parameters for search (Nareyek, 2003). Reinforcement learning is particularly attractive because it could also learn to choose a type of neighborhood... |

49 | Translation with finite-state devices. - Knight, Al-Onaizan - 1998 |

43 | Novel reordering approaches in phrase-based statistical machine translation.
- Kanthak, Vilar, et al.
- 2005
(Show Context)
Citation Context ...pping swaps in π to be carried out as a single move. The best move in this neighborhood may again be found by dynamic programming. 21 Dynasearch neighborhoods are similar to the local reorderings of (=-=Kanthak et al., 2005-=-) and (Kumar and Byrne, 2005) (see section 6). However, local search can iterate them in order to explore permutations that are further afield. While dynasearch neighborhoods are considerably smaller ... |

41 | Learning as search optimization: approximate large margin methods for structured prediction, in: - Marcu - 2005 |

41 | Cder: Efficient mt evaluation using block movements. In - Leusch, Ueffing, et al. - 2006 |

41 | Choosing search heuristics by Non-Stationary reinforcement learning.
- Nareyek
- 2001
(Show Context)
Citation Context ...he search procedure (Daumé III and Marcu, 2005), adapt the objective function consulted by local search (Boyan and Moore, 2000), or use reinforcement learning to learn separate parameters for search (=-=Nareyek, 2003-=-). Reinforcement learning is particularly attractive because it could also learn to choose a type of neighborhood for the next search step, based on the search history. 9 Experiments 9.1 Speed and acc... |

39 | On approximation preserving reductions: Complete problems and robust measures.
- Orponen, Mannila
- 1987
(Show Context)
Citation Context ...ns in economics, sociology, graph theory, archaeology, and task scheduling (Grötschel et al., 1984), as well as 1 Namely, the weighted Hamiltonian path problem. 2 MIN-TSP is known to be NPO-complete (=-=Orponen and Mannila, 1987-=-), and MIN-LOP is conjectured to be outside APX (Mishra and Sikdar, 2004). We are unaware of previous work on C(π), but it is at least as hard as B(π), i.e., the LOP. 3 1 1 1 3 3 graph drawing (Eades ... |

34 |
Drawing graphs in two layers
- Eades, Whitesides
- 1994
(Show Context)
Citation Context ..., 1987), and MIN-LOP is conjectured to be outside APX (Mishra and Sikdar, 2004). We are unaware of previous work on C(π), but it is at least as hard as B(π), i.e., the LOP. 3 1 1 1 3 3 graph drawing (=-=Eades and Whitesides, 1994-=-). Other useful NP-complete problems, such as the weighted feedback arc set or acyclic subgraph problem, reduce trivially to LOP (Grötschel et al., 1984). Within natural language processing, our model... |

28 | A study of exponential neighborhoods for the travelling salesman problem and for the quadratic assignment problem,
- Deineko, Woeginger
- 2000
(Show Context)
Citation Context ...his local search can consider an exponentially large set of candidate improvements at each iteration, efficiently searching this set by dynamic programming. This has previously been done for the TSP (=-=Deǐnenko and Woeginger, 2000-=-), but our new algorithm handles the full ABC model. 5. Speedups: We discuss how to speed up our dynamic programming algorithm, which resembles CKY parsing, by applying techniques from natural-languag... |

23 | Analysis, statistical transfer, and synthesis in machine translation
- Pietra, Lafferty, et al.
- 1992
(Show Context)
Citation Context ...ng French (f) into a “French-prime” (e ′ ) that still uses French words but can be more easily— indeed monotonically—translated into English (e). This recalls heuristic “analysis-transfer-synthesis” (=-=Brown et al., 1992-=-; Nießen and Ney, 2001), which makes the statistical transfer step easier to learn by linguistically hand-crafting a deterministic preprocessor (analysis) and postprocessor (synthesis). Our approach m... |

22 |
The approximate solution of the traveling salesman problem by a local algorithm that searches neighborhoods of exponential cardinality in quadratic time
- Sarvanov, Doroshko
- 1981
(Show Context)
Citation Context ...N. 5.1 Lopsided twisted-sequence neighborhoods We can optionally reduce the N 3 factor by considering only right-branching trees. This corresponds to the “pyramidal” neighborhood used for the TSP by (=-=Sarvanov and Doroshko, 1981-=-). Better yet, consider the neighborhood defined by “asymmetrically branching” trees: Fix a small constant h, and when assembling constituents in (9)– (10), only consider triples (i, j, k) such that (... |

16 |
An Effective Heuristic Algorithm for the Travelling-Salesman Problem,"
- Lin, Kernighan
- 1973
(Show Context)
Citation Context ...eplace π with its lowest-cost neighbor, iterating until no further local improvement is possible. Local search has previously been used for SMT decoding (Germann et al., 2001), 17 as well as the TSP (=-=Lin and Kernighan, 1973-=-) and the LOP (Congram, 2000). In general, each step considers some neighborhood of candidate solutions that are derived from the current candidate. It greedily moves to the lowest-cost neighbor of th... |

11 |
A fast finite-state relaxation method for enforcing global constraints on sequence decoding
- Tromble, Eisner
- 2006
(Show Context)
Citation Context ...eness (Knight, 1999; Udupa and Maji, 2006). Other recent work has similarly reduced NLP problems to previously studied formal problems (Roth and Yih, 2005; Taskar et al., 2005; McDonald et al., 2005; =-=Tromble and Eisner, 2006-=-). Those papers drew on exact solution methods. We instead draw on another rich vein of literature— heuristic strategies with some chance of search error. 14 This may unfortunately reduce the number o... |

8 |
Multi-Document Statistical Fact Extraction and Fusion
- Mann
- 2006
(Show Context)
Citation Context ...es well under a small language model derived from a set of reference translations. Information extraction systems may wish to reconstruct a temporal order for the extracted events (Mani et al., 2003; =-=Mann, 2006-=-); resolving conflicting cues to the true order may be treated as an instance of the LOP (Glover et al., 1974). Phonology learning involves the computationally hard problem of choosing a rule ordering... |

8 |
Computational complexity of statistical machine translation
- Udupa, Maji
- 2006
(Show Context)
Citation Context ...omaton model A (the TSP) or a pure beforeness model B (the LOP). Germann et al. (2001) reduce such a problem in statistical MT to integer linear programming, noting its NP-completeness (Knight, 1999; =-=Udupa and Maji, 2006-=-). Other recent work has similarly reduced NLP problems to previously studied formal problems (Roth and Yih, 2005; Taskar et al., 2005; McDonald et al., 2005; Tromble and Eisner, 2006). Those papers d... |

7 |
Optimal weighted ancestry relationships
- Glover, Klastorin, et al.
- 1974
(Show Context)
Citation Context ...ction systems may wish to reconstruct a temporal order for the extracted events (Mani et al., 2003; Mann, 2006); resolving conflicting cues to the true order may be treated as an instance of the LOP (=-=Glover et al., 1974-=-). Phonology learning involves the computationally hard problem of choosing a rule ordering or constraint ranking (Eisner, 2000); our “ABC” cost model might be used here to approximate the true cost o... |

5 | Dynasearch – Iterative local improvement by dynamic programming: Part 1, The traveling salesman problem - Potts, Velde - 1995 |

3 | Using grammars to generate very large scale neighborhoods for the traveling salesman problem and other sequencing problems
- Bonaparte, Orlin
- 2005
(Show Context)
Citation Context ....83 N N! = ln N!/ ln(c · 5.83 N ) ≈ N ln N/N ln 5.83 ≈ 0.39 log 2 N. 5.2 Very large-scale finite-state neighborhoods So-called “dynasearch” neighborhoods (Potts and van de Velde, 1995; Congram, 2000; =-=Bompadre and Orlin, 2005-=-) allow any collection of nonoverlapping swaps in π to be carried out as a single move. The best move in this neighborhood may again be found by dynamic programming. 21 Dynasearch neighborhoods are si... |

3 |
On approximability of linear ordering and related NP-optimization problems on graphs
- Mishra, Sikdar
- 2004
(Show Context)
Citation Context ... (Grötschel et al., 1984), as well as 1 Namely, the weighted Hamiltonian path problem. 2 MIN-TSP is known to be NPO-complete (Orponen and Mannila, 1987), and MIN-LOP is conjectured to be outside APX (=-=Mishra and Sikdar, 2004-=-). We are unaware of previous work on C(π), but it is at least as hard as B(π), i.e., the LOP. 3 1 1 1 3 3 graph drawing (Eades and Whitesides, 1994). Other useful NP-complete problems, such as the we... |

2 |
Generating anagrams from multiple core strings employing user-defined vocabularies and orthographic parameters
- Jordan, Monteiro
- 2003
(Show Context)
Citation Context ... model might be used here to approximate the true cost of an ordering or ranking. Finally, one might wish to find multi-word or non-word anagrams that score well under a language model (Morton, 1987; =-=Jordan and Monteiro, 2003-=-). We now discuss one application in detail: statistical machine translation. 3 Input Reordering for Translation 3 Word reordering is a central part of statistical machine translation (SMT). We show t... |

1 |
Dynamic programming and the representation of error-correcting codes
- Geman, Kochanek
- 2001
(Show Context)
Citation Context ... any r-labeled arc in A from any q ∈ ¯q to any s ∈ ¯s. Now A ′ provides an admissible estimate of A, and a succession of automata A, A ′ , A ′′ , . . . can be used for an exact coarse-to-fine search (=-=Geman and Kochanek, 2001-=-). An independent type of A* heuristic would similarly coarsen the beam-search automaton in section 6 below, then obtain Viterbi forward/backward estimates. Here, coarsening relaxes the requirement th... |

1 |
Recursion + data structures = anagrams
- Morton
- 1987
(Show Context)
Citation Context ...our “ABC” cost model might be used here to approximate the true cost of an ordering or ranking. Finally, one might wish to find multi-word or non-word anagrams that score well under a language model (=-=Morton, 1987-=-; Jordan and Monteiro, 2003). We now discuss one application in detail: statistical machine translation. 3 Input Reordering for Translation 3 Word reordering is a central part of statistical machine t... |