Results 1 -
7 of
7
Coupled term-term relation analysis for document clustering
- In Proceedings of IJCNN2013. Cheng, A. Six degrees of separation, Twitter style
, 2013
"... Abstract—Traditional document clustering approaches are usually based on the Bag of Words model, which is limited by its assumption of the independence among terms. Recent strategies have been proposed to capture the relation between terms based on statistical analysis, and they estimate the relatio ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Traditional document clustering approaches are usually based on the Bag of Words model, which is limited by its assumption of the independence among terms. Recent strategies have been proposed to capture the relation between terms based on statistical analysis, and they estimate the relation between terms purely by their co-occurrence across the documents. However, the implicit interactions with other link terms are overlooked, which leads to the discovery of incomplete information. This paper proposes a coupled term-term relation model for document representation, which considers both the intra-relation (i.e. co-occurrence of terms) and inter-relation (i.e. dependency of terms via link terms) between a pair of terms. The coupled relation for each pair of terms is further used to map a document onto a new feature space, which includes more semantic information. Substantial experiments verify that the document clustering incorporated with our proposed relation achieves a significant performance improvement compared to the state-of-the-art techniques. I.
Evans: The Use of Monolingual Context Vectors for Missing Translations in Cross-Language Information Retrieval
- IJCNLP
"... Abstract. For cross-language text retrieval systems that rely on bilingual dictionaries for bridging the language gap between the source query language and the target document language, good bilingual dictionary coverage is imperative. For terms with missing translations, most systems employ some ap ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. For cross-language text retrieval systems that rely on bilingual dictionaries for bridging the language gap between the source query language and the target document language, good bilingual dictionary coverage is imperative. For terms with missing translations, most systems employ some approaches for expanding the existing translation dictionaries. In this paper, instead of lexicon expansion, we explore whether using the context of the unknown terms can help mitigate the loss of meaning due to missing translation. Our approaches consist of two steps: (1) to identify terms that are closely associated with the unknown source language terms as context vectors and (2) to use the translations of the associated terms in the context vectors as the surrogate translations of the unknown terms. We describe a query-independent version and a query-dependent version using such monolingual context vectors. These methods are evaluated in Japanese-to-English retrieval using the NTCIR-3 topics and data sets. Empirical results show that both methods improved CLIR performance for short and medium-length queries and that the query-dependent context vectors performed better than the query-independent versions. 1
Knowledge Reduction Information Retrieval Model in Pathology Medical Domain
"... Abstract We present an efficient intelligent information retrieval model using reduction of domain-specific expert knowledge, demonstrating its use in the pathology medical domain. We created an information retrieval model that incorporates domain-specific knowledge to provide knowledgeable answers ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract We present an efficient intelligent information retrieval model using reduction of domain-specific expert knowledge, demonstrating its use in the pathology medical domain. We created an information retrieval model that incorporates domain-specific knowledge to provide knowledgeable answers to users. This model converts domain-specific knowledge to a relationship of terms represented as quantitative values, which gives improved efficiency. The conversion technology, called “knowledge reduction,” enables the off-line calculation of knowledge separate from the information retrieval (IR) process. This results in the real-time processing of retrieval results. We performed a simulation of the developed Intelligent IR model in the Pathology medical domain. Our approach resulted in an approximately 30 % performance gain measured by average precision at 11 standard recall levels metrics when compared with the vector space model based IR method.
Towards Agent Societies for Information Retrieval ∗†
"... Abstract – Building decentralised Information Retrieval Systems is one of the key challenges for the future Internet. These systems will most probably take the shape of open multiagent systems – societies of Information Agents that coordinate in a peer-to-peer fashion. In this paper we first analyse ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract – Building decentralised Information Retrieval Systems is one of the key challenges for the future Internet. These systems will most probably take the shape of open multiagent systems – societies of Information Agents that coordinate in a peer-to-peer fashion. In this paper we first analyse current IR methods so as to identify shortcomings in scalability and personalization of the information services. We then draw upon ideas from peer-to-peer technologies, to determine the requirements, and sketch the structure, of future societies of Information Agents.
A New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval
"... In this paper, we present a new approach for automatic thesaurus construction and query expansion for document retrieval. We analyze the information between any two terms in each document cluster center of final document clusters or intermediate document clusters in the clustering process to automat ..."
Abstract
- Add to MetaCart
In this paper, we present a new approach for automatic thesaurus construction and query expansion for document retrieval. We analyze the information between any two terms in each document cluster center of final document clusters or intermediate document clusters in the clustering process to automatically construct the thesaurus, where these information includes the co-occurrence frequency of any two terms in each document cluster center, the degree of effect of each term in each document cluster center and the inner noise of each document cluster, respectively. We also present a query expansion method to expand the user’s queries and present a new method to calculate the degree of similarity between the user’s query and documents. The proposed thesaurus construction method and the proposed query expansion method can improve the performance of information retrieval systems for dealing with document retrieval.
Some serial number Automatic Query Expansion: a Structural Linguistic Perspective
"... A user’s query is considered to be an imprecise description of their information need. Au-tomatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model the syntagmatic associations that ..."
Abstract
- Add to MetaCart
A user’s query is considered to be an imprecise description of their information need. Au-tomatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model the syntagmatic associations that exist between words in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syn-tagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a for-mal, corpus-based model of word meaning that models syntagmatic and paradigmatic asso-ciations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant