DMCA
HC-Search: Learning Heuristics and Cost Functions for Structured Prediction
Citations: | 2 - 0 self |
Citations
3374 | Conditional random fields: Probabilistic models for segmenting and labeling sequence data - Lafferty, McCallum, et al. - 2001 |
584 | Max margin markov networks - Taskar, Guestrin, et al. - 2003 |
440 | Support vector machine learning for interdependent and structured output spaces
- Tsochantaridis, Homann, et al.
- 2004
(Show Context)
Citation Context ...g the Argmin problem is intractable except in limited cases such as when the dependency structure among features forms a tree (Lafferty, McCallum, and Pereira 2001; Taskar, Guestrin, and Koller 2003; =-=Tsochantaridis et al. 2004-=-). For more complex structures, heuristic inference methods such as loopy belief propagation and variational inference have shown some success in practice. However, the learning algorithms generally a... |
421 | Online passive-aggressive algorithms.
- Crammer, Dekel, et al.
- 2006
(Show Context)
Citation Context ...ven to a learning algorithm to learn the weights of the heuristic function. In our implementation we employed the margin-scaled variant of the online Passive-Aggressive algorithm (see Equation 47 in (=-=Crammer et al. 2006-=-)) as our base learner to train the heuristic function, which we have found to be quite effective yet efficient. If we can learn a function H from hypothesis space H that is consistent with these rank... |
325 | Discriminative reranking for natural language parsing - Collins, Koo - 2005 |
140 | Approximate policy iteration with a policy language bias - Fern, Yoon, et al. - 2003 |
110 |
Semantic modeling of natural scenes for content-based image retrieval
- Vogel, Schiele
- 2007
(Show Context)
Citation Context ...me. This is similar to NETtalk Stress except that the task is to assign one of the 51 phoneme labels to each letter of the word. 4) Scene labeling. This dataset contains 700 images of outdoor scenes (=-=Vogel and Schiele 2007-=-). Each image is divided into patches by placing a regular grid of size 10 × 10 and each patch takes one of the 9 semantic labels (sky, water, grass, trunks, foliage, field, rocks, flowers, sand). Sim... |
99 | Robust trainability of single neurons - Hoffgen, Simon, et al. - 1995 |
65 | A reduction of imitation learning and structured prediction to no-regret online learning - Ross, Gordon, et al. - 2011 |
59 | Learning to take actions
- Khardon
- 1999
(Show Context)
Citation Context ...a. Further, given assumptions on the base learning algorithm (e.g. PAC), generic imitation learning results can be used to give generalization guarantees on the performance of search on new examples (=-=Khardon 1999-=-; Fern, Yoon, and Givan 2006; Ross and Bagnell 2010). Our experiments show, that the simple approach described above, performs extremely well on our problems. Above we noted that we only need to colle... |
45 | A comparison of ID3 and backpropagation for english text-to-speech mapping - DIETTERICH, HILD, et al. - 1995 |
43 | Structured prediction cascades
- Weiss, Taskar
- 2010
(Show Context)
Citation Context ...nd Bagnell 2011). Unfortunately, in many problems, some decisions are difficult to make by a greedy classifier, but are crucial for good performance. Cascade models (Felzenszwalb and McAllester 2007; =-=Weiss and Taskar 2010-=-; Weiss, Sapp, and Taskar 2010) achieve efficiency by performing inference on a sequence of models from coarse to finer levels. However, cascading places strong restrictions on the form of the cost fu... |
42 | Efficient reductions for imitation learning
- Ross, Bagnell
- 2010
(Show Context)
Citation Context ...learning algorithm (e.g. PAC), generic imitation learning results can be used to give generalization guarantees on the performance of search on new examples (Khardon 1999; Fern, Yoon, and Givan 2006; =-=Ross and Bagnell 2010-=-). Our experiments show, that the simple approach described above, performs extremely well on our problems. Above we noted that we only need to collect and learn to imitate the “sufficient” pairwise d... |
30 | Web-search ranking with initialized gradient boosted regression trees - Mohan, Chen, et al. - 2011 |
24 | On learning linear ranking functions for beam search - Xu, Fern - 2007 |
23 | Sidestepping intractable inference with structured ensemble cascades - Weiss, Sapp, et al. - 2010 |
20 |
Samplerank: Training factor graphs with atomic gradients
- Wick, Rohanimanesh, et al.
- 2011
(Show Context)
Citation Context ... form of the cost functions. We are inspired by the recent successes of output-space search approaches, which place few restrictions on the form of the cost function (Doppa, Fern, and Tadepalli 2012; =-=Wick et al. 2011-=-). These methods learn and use a cost function to direct a combinatorial search through a space of outputs, and return the least cost output uncovered. While these approaches have achieved state-of-th... |
19 | Speedboost: Anytime prediction with uniform nearoptimality
- Grubb, Bagnell
- 2012
(Show Context)
Citation Context ...tputs produced by the heuristic. This analysis suggests that learning more powerful cost functions, e.g., Regression trees (Mohan, Chen, and Weinberger 2011), with an eye towards anytime performance (=-=Grubb and Bagnell 2012-=-; Xu, Weinberger, and Chapelle 2012) would be productive. Our results also suggested that there is also room to improve overall performance with better heuristic learning. Thus, another direction to p... |
12 | Learnability of bipartite ranking functions
- Agarwal, Roth
- 2005
(Show Context)
Citation Context ...imum cost output ˆy equals lbest, i.e., L(x, ˆy, y ∗ ) = lbest, where ˆy = arg min y∈YH(x) C(x, y). We formulate the cost function training problem similarly to traditional learning to rank problems (=-=Agarwal and Roth 2005-=-). More specifically, we want all the best loss outputs in YH(x) to be ranked better than all the non-best loss outputs according to our cost function, which is a bi-partite ranking problem. Let Ybest... |
10 | Output Space Search for Structured Prediction - Tadepalli - 2012 |
5 | Speedup learning - Fern - 2010 |
4 | Training factor graphs with reinforcement learning for efficient map inference
- Wick, Rohanimanesh, et al.
- 2009
(Show Context)
Citation Context ...as a policy for guiding “search actions” and rewards are received for uncovering high quality outputs. In fact, this approach has been explored for structured prediction in the case of greedy search (=-=Wick et al. 2009-=-) and was shown to be effective given a carefully designed reward function and action space. While this is a viable approach, general purpose RL can be quite sensitive to the algorithm parameters and ... |
1 |
The generalized A* architecture. JAIR
- Felzenszwalb, McAllester
- 2007
(Show Context)
Citation Context ...d, and Marcu 2009; Ross, Gordon, and Bagnell 2011). Unfortunately, in many problems, some decisions are difficult to make by a greedy classifier, but are crucial for good performance. Cascade models (=-=Felzenszwalb and McAllester 2007-=-; Weiss and Taskar 2010; Weiss, Sapp, and Taskar 2010) achieve efficiency by performing inference on a sequence of models from coarse to finer levels. However, cascading places strong restrictions on ... |
1 | Searchbased structured prediction. MLJ 75(3):297–325 - Langford, J, et al. - 2009 |
1 | prioritization for trading off accuracy and speed - Learned |