Results 1 -
2 of
2
unknown title
"... Abstract Question answering forums are rapidly growing in size with no automated ability to refer to and reuse existing answers. In this paper, we develop a methodology for finding semantically related questions. The task is difficult since 1) key pieces of information are often buried in extraneou ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Question answering forums are rapidly growing in size with no automated ability to refer to and reuse existing answers. In this paper, we develop a methodology for finding semantically related questions. The task is difficult since 1) key pieces of information are often buried in extraneous details in the question body and 2) available annotations are scarce and fragmented, driven largely by participants. We design a novel combination of recurrent and convolutional models (gated convolutions) to effectively map questions to their semantic representations. The models are pre-trained within an encoder-decoder framework (from body to title) on the basis of the entire raw corpus, and fine-tuned discriminatively from limited annotations. Our evaluation demonstrates that our model yields a 10% gain over a standard IR baseline, and a 5% over standard neural network architectures (including CNNs, LSTMs and GRUs) trained analogously. Introduction Question answering (QA) forums such as Stack Exchange 2 are rapidly expanding and already contain millions of questions. The expanding scope and coverage of these forums often leads to many duplicate and interrelated questions, resulting in the same questions being answered multiple times. By identifying similar questions, we can potentially reuse existing answers, reducing response times and unnecessary repeated work. Unfortunately in most forums, the process of identifying and referring to existing similar questions is done manually by forum participants with limited, scattered success. The task of automatically retrieving similar questions to a given user's question has recently attracted significant attention and has become a testbed for various representation learning approaches Several factors make the problem difficult. First, submitted questions are often long and contain extraneous information irrelevant to the main question being asked. For instance, the first question in
Algorithms, Experimentation
"... In this paper, we extensively study the use of syntactic and semantic structures obtained with shallow and deeper syntactic parsers in the answer passage reranking task. We propose several dependency-based structures enriched with Linked Open Data (LD) knowledge for representing pairs of questions a ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, we extensively study the use of syntactic and semantic structures obtained with shallow and deeper syntactic parsers in the answer passage reranking task. We propose several dependency-based structures enriched with Linked Open Data (LD) knowledge for representing pairs of questions and answer passages. We use such tree structures in learning to rank (L2R) algorithms based on tree kernel. The latter can represent questions and passages in a tree fragment space, where each substructure represents a powerful syn-tactic/semantic feature. Additionally since we define links between structures, tree kernels also generate relational features spanning question and passage structures. We derive very important find-ings, which can be useful to build state-of-the-art systems: (i) full syntactic dependencies can outperform shallow models also using external knowledge and (ii) the semantic information should be de-rived by effective and high-coverage resources, e.g., LD, and in-corporated in syntactic structures to be effective. We demonstrate our findings by carrying out an extensive comparative experimenta-tion on two different TREC QA corpora and one community ques-tion answer dataset, namely Answerbag. Our comparative analysis on well-defined answer selection benchmarks consistently demon-strates that our structural semantic models largely outperform the state of the art in passage reranking.