#### DMCA

## Multistep regression learning for compositional distributional semantics (2013)

### Cached

### Download Links

Citations: | 29 - 12 self |

### Citations

1812 | A solution to plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge
- Landauer, Dumais
- 1997
(Show Context)
Citation Context ...kens with which words co-occur within a sentence or frame of n tokens. Such models have been successfully applied to various tasks such as thesaurus extraction (Grefenstette, 1994) and essay grading (=-=Landauer and Dumais, 1997-=-; Dumais, 2003). However, unlike their formal semantics counterparts, distributional models have no explicit canonical composition operation, and provide no way to integrate syntactic information into... |

450 |
Generalized cross-validation as a method for choosing a good ridge parameter
- Golub, Heath, et al.
- 1979
(Show Context)
Citation Context ...ned weight matrix B = (X T X + λI) −1 X T Y and produces competitive results at a faster speed. For each verb matrix or tensor to be learned, we tuned the parameter λ by generalized cross-validation (=-=Golub et al., 1979-=-). The objective function used for tuning minimizes least square error when predicting corpusobserved sentence vectors or intermediate VP matrices (the data sets we evaluate the models on are not touc... |

274 | Projected Gradient Methods for Nonnegative Matrix Factorization - Lin - 2007 |

231 |
Über Sinn und Bedeutung’. Zeitschrift für Philosophie und Philosophische Kritik, 100. 25-50. Transl. as: ‘On sense and reference
- Frege
- 1960
(Show Context)
Citation Context ...lication, treating certain words as functions that operate on other words to construct meaning incrementally according to a calculus of composition that reflects the syntactic structure of sentences (=-=Frege, 1892-=-; Montague, 1970; Partee, 2004). Coecke et al. (2010) have proposed a general formalism for composition in distributional semantics that captures the same notion of function application. Empirical imp... |

220 | Vector-based Models of Semantic Composition. - Mitchell, Lapata - 2008 |

184 |
Riemannian Manifolds: An Introduction to Curvature. Graduate Texts in Mathematics
- Lee
- 1997
(Show Context)
Citation Context ... equivalent to the matrix multiplication: f(v) = M × v = w In the case of multilinear maps, this correspondence generalises to a correlation between n-ary maps and rank n + 1 tensors (Bourbaki, 1989; =-=Lee, 1997-=-). Tensors are generalisations of vectors and matrices; they have larger degrees of freedom referred to as tensor ranks, which is one for vectors and two for matrices. To illustrate this generalisatio... |

183 | Semantic compositionality through recursive matrix-vector spaces. - Socher, Huval, et al. - 2012 |

148 | Composition in distributional models of semantics. - Mitchell, Lapata - 2010 |

128 | Extracting semantic representations from word co-occurrence statistics: Stop-lists, stemming, and SVD.
- Bullinaria, A, et al.
- 2012
(Show Context)
Citation Context ... step in the construction of distributional semantic models. Extensive evidence suggests that dimensionality reduction does not affect, and might even improve the quality of lexical semantic vectors (=-=Bullinaria and Levy, 2012-=-; Landauer and Dumais, 1997; Sahlgren, 2006; Schütze, 1997). In our setting, dimensionality reduction is virtually necessary, since working with 10K-dimensional vectors is problematic for the Regressi... |

122 | A structured vector space model for word meaning in context.
- Erk, Pado
- 2008
(Show Context)
Citation Context ...uture work. Several studies tackle word meaning in context, that is, how to adapt the distributional representation of a word to the specific context in which it appears (e.g., Dinu and Lapata, 2010; =-=Erk and Padó, 2008-=-; Thater et al., 2011). We see this as complementary rather than alternative to composition: Distributional representations of single words should first be adapted to context with these methods, and t... |

88 | Nouns are vectors, adjectives are matrices: Representing adjectivenoun constructions in semantic space.
- Baroni, Zamparelli
- 2010
(Show Context)
Citation Context ...ti-step regression learning is a generalisation of linear regression learning for tensors of rank 3 or higher, as procedures already exist for tensors of rank 1 (lexical semantic vectors) and rank 2 (=-=Baroni and Zamparelli, 2010-=-). For rank 1 tensors, we suggest learning vectors using any standard lexical semantic vector learning model, and present sample parameters in Section 5.1 below. Learning rank 2 tensors (matrices) can... |

86 | Mathematical foundations for a compositional distributional model of meaning.
- Coecke, Sadrzadeh, et al.
- 2010
(Show Context)
Citation Context ...o improve Add performance in our earlier experiments. Grefenstette and Sadrzadeh (2011b) proposed a specific implementation of the general DisCoCat approach to compositional distributional semantics (=-=Coecke et al., 2010-=-) that we call Kronecker here. 1 We ran the experiments reported below in full space for those models for which it was possible, finding that Multiply obtained better results there (approaching those ... |

49 |
The Word-Space Model.
- Sahlgren
- 2006
(Show Context)
Citation Context ...odels. Extensive evidence suggests that dimensionality reduction does not affect, and might even improve the quality of lexical semantic vectors (Bullinaria and Levy, 2012; Landauer and Dumais, 1997; =-=Sahlgren, 2006-=-; Schütze, 1997). In our setting, dimensionality reduction is virtually necessary, since working with 10K-dimensional vectors is problematic for the Regression approach (see Section 5.2 below), that r... |

48 |
English as a Formal Language.” In Linguaggi nella Società e nella
- Montague
- 1970
(Show Context)
Citation Context ...al form—by defining a systematic passage from syntactic rules to the composition of parts of logical expressions. This allows us to derive the logical form a of sentence from its syntactic structure (=-=Montague, 1970-=-). These models are fully compositional, whereby the meaning of a phrase is a function of the meaning of its parts; however, as they reduce meaning to logical form, they are not necessarily adapted to... |

47 | Papers in Linguistics: - Firth - 1957 |

46 | Experimental support for a categorical compositional distributional model of meaning. - Grefenstette, Sadrzadeh - 2011 |

37 | Measuring distributional similarity in context,”
- Dinu, Lapata
- 2010
(Show Context)
Citation Context ... direct comparison to future work. Several studies tackle word meaning in context, that is, how to adapt the distributional representation of a word to the specific context in which it appears (e.g., =-=Dinu and Lapata, 2010-=-; Erk and Padó, 2008; Thater et al., 2011). We see this as complementary rather than alternative to composition: Distributional representations of single words should first be adapted to context with ... |

37 | A regression model of adjective-noun compositionality in distributional semantics.
- Guevara
- 2010
(Show Context)
Citation Context ...2009). RR, also known as L2 regularized regression, is a different approach from the Partial Least Square Regression (PLSR) method that was used in previous related work (Baroni and Zamparelli, 2010; =-=Guevara, 2010-=-) to deal with the multicollinearity problem. When multicollinearity exists, the matrix X T X (X here is the input matrix after dimensionality reduction) becomes nearly singular and the diagonal eleme... |

28 | Data-driven approaches to information access
- Dumais
- 2003
(Show Context)
Citation Context ...cur within a sentence or frame of n tokens. Such models have been successfully applied to various tasks such as thesaurus extraction (Grefenstette, 1994) and essay grading (Landauer and Dumais, 1997; =-=Dumais, 2003-=-). However, unlike their formal semantics counterparts, distributional models have no explicit canonical composition operation, and provide no way to integrate syntactic information into word meaning ... |

21 |
Compositionality in Formal Semantics.
- Partee
- 2004
(Show Context)
Citation Context ...rds as functions that operate on other words to construct meaning incrementally according to a calculus of composition that reflects the syntactic structure of sentences (Frege, 1892; Montague, 1970; =-=Partee, 2004-=-). Coecke et al. (2010) have proposed a general formalism for composition in distributional semantics that captures the same notion of function application. Empirical implementations of Coecke’s et al... |

15 | Experimenting with transitive verbs in a discocat - Grefenstette, Sadrzadeh - 2011 |

12 |
Word meaning in context: A simple and effective vector model
- Thater, Fürstenau, et al.
- 2011
(Show Context)
Citation Context ...studies tackle word meaning in context, that is, how to adapt the distributional representation of a word to the specific context in which it appears (e.g., Dinu and Lapata, 2010; Erk and Padó, 2008; =-=Thater et al., 2011-=-). We see this as complementary rather than alternative to composition: Distributional representations of single words should first be adapted to context with these methods, and then composed to repre... |

6 |
Ambiguity Resolution in Natural Language Learning
- Schütze
- 1997
(Show Context)
Citation Context ... evidence suggests that dimensionality reduction does not affect, and might even improve the quality of lexical semantic vectors (Bullinaria and Levy, 2012; Landauer and Dumais, 1997; Sahlgren, 2006; =-=Schütze, 1997-=-). In our setting, dimensionality reduction is virtually necessary, since working with 10K-dimensional vectors is problematic for the Regression approach (see Section 5.2 below), that requires learnin... |

4 |
Commutative Algebra: Chapters 1-7. Springer-Verlag (Berlin and
- Bourbaki
- 1989
(Show Context)
Citation Context ... vector w ∈ B is equivalent to the matrix multiplication: f(v) = M × v = w In the case of multilinear maps, this correspondence generalises to a correlation between n-ary maps and rank n + 1 tensors (=-=Bourbaki, 1989-=-; Lee, 1997). Tensors are generalisations of vectors and matrices; they have larger degrees of freedom referred to as tensor ranks, which is one for vectors and two for matrices. To illustrate this ge... |

1 | distributional models of meaning - Grefenstette, Sadrzadeh, et al. - 2011 |

1 | Concrete sentence spaces for compositional Grefenstette - unknown authors - 1994 |