Results 1 - 10
of
53
Indexing by latent semantic analysis
- JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
, 1990
"... A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The p ..."
Abstract
-
Cited by 2168 (30 self)
- Add to MetaCart
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 or-thogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are re-turned. initial tests find this completely automatic method for retrieval to be promising.
Bead: Explorations in Information Visualization
- In Proceedings of ACM SIGIR
, 1992
"... We describe work on the visualization of bibliographic data and, to aid in this task, the application of numerical techniques for multidimensional scaling. Many areas of scientific research involve complex multivariate data. One example of this is Information Retrieval. Document comparisons may be d ..."
Abstract
-
Cited by 94 (0 self)
- Add to MetaCart
We describe work on the visualization of bibliographic data and, to aid in this task, the application of numerical techniques for multidimensional scaling. Many areas of scientific research involve complex multivariate data. One example of this is Information Retrieval. Document comparisons may be done using a large number of variables. Such conditions do not favour the more wellknown methods of visualization and graphical analysis, as it is rarely feasible to map each variable onto one aspect of even a three-dimensional, coloured and textured space. Bead is a prototype system for the graphically-based exploration of information. In this system, articles in a bibliography are represented by particles in 3-space. By using physically-based modelling techniques to take advantage of fast methods for the approximation of potential fields, we represent the relationships between articles by their relative spatial positions. Inter-particle forces tend to make similar articles move closer to on...
On the Use of Spreading Activation Methods in Automatic Information Retrieval
, 1988
"... Spreading activation methods have been recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets. The spreading activation strategy is reminiscent of earlier associative indexing and retrieval systems. Some spreading activation procedures are ..."
Abstract
-
Cited by 69 (0 self)
- Add to MetaCart
Spreading activation methods have been recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets. The spreading activation strategy is reminiscent of earlier associative indexing and retrieval systems. Some spreading activation procedures are briefly described, and evaluation output is given, reflecting the effectiveness of one of the proposed procedures.
Optimizing Ranking Functions: A Connectionist Approach to Adaptive Information Retrieval
- DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, THE UNIVERSITY OF CALIFORNIA, SAN DIEGO
, 1994
"... This dissertation examines the use of adaptive methods to automatically improve the performance of ranked text retrieval systems. The goal of a ranked retrieval system is to manage a large collection of text documents and to order documents for a user based on the estimated relevance of the document ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This dissertation examines the use of adaptive methods to automatically improve the performance of ranked text retrieval systems. The goal of a ranked retrieval system is to manage a large collection of text documents and to order documents for a user based on the estimated relevance of the documents to the user's information need (or query). The ordering enables the user to quickly find documents of interest. Ranked retrieval is a difficult problem because of the ambiguity of natural language, the large size of the collections, and because of the varying needs of users and varying collection characteristics. We propose and empirically validate general adaptive methods which improve the ability of a large class of retrieval systems to rank documents effectively. Our main adaptive method is to numerically optimize free parameters in a retrieval system by minimizing a non-metric criterion function. The criterion measures how well the system is ranking documents relative to a target ordering, defined by a set of training queries which include the users' desired document orderings. Thus, the system learns parameter settings which better enable it to rank relevant documents before irrelevant. The non-metric approach is interesting because it is a general adaptive method, an alternative to supervised methods for training neural networks in domains in which rank order or prioritization is important. A second adaptive method is also examined, which is applicable to a restricted class of retrieval systems but which permits an analytic solution. The adaptive methods are applied to a number of problems in text retrieval to validate their utility and practical efficiency. The applications include: A dimensionality reduction of vector-based document representations to a vector spa...
Constructing virtual documents for ontology matching
- In Proceedings of the 15th International World Wide Web Conference
, 2006
"... On the investigation of linguistic techniques used in ontology matching, we propose a new idea of virtual documents to pursue a cost-effective approach to linguistic matching in this paper. Basically, as a collection of weighted words, the virtual document of a URIref declared in an ontology contain ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
On the investigation of linguistic techniques used in ontology matching, we propose a new idea of virtual documents to pursue a cost-effective approach to linguistic matching in this paper. Basically, as a collection of weighted words, the virtual document of a URIref declared in an ontology contains not only the local descriptions but also the neighboring information to reflect the intended meaning of the URIref. Document similarity can be computed by traditional vector space techniques, and then be used in the similaritybased approaches to ontology matching. In particular, the RDF graph structure is exploited to define the description formulations and the neighboring operations. Experimental results show that linguistic matching based on the virtual documents is dominant in average F-Measure as compared to other three approaches. It is also demonstrated by our experiments that the virtual documents approach is cost-effective as compared to other linguistic matching approaches.
An approach to detecting duplicate bug reports using natural language and execution information
- In ICSE ’08: Proceedings of the 30th International Conference on Software Engineering
, 2008
"... An open source project typically maintains an open bug repository so that bug reports from all over the world can be gathered. When a new bug report is submitted to the repository, a person, called a triager, examines whether it is a duplicate of an existing bug report. If it is, the triager marks i ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
An open source project typically maintains an open bug repository so that bug reports from all over the world can be gathered. When a new bug report is submitted to the repository, a person, called a triager, examines whether it is a duplicate of an existing bug report. If it is, the triager marks it as DUPLICATE and the bug report is removed from consideration for further work. In the literature, there are approaches exploiting only natural language information to detect duplicate bug reports. In this paper we present a new approach that further involves execution information. In our approach, when a new bug report arrives, its natural language information and execution information are compared with those of the existing bug reports. Then, a small number of existing bug reports are suggested to the triager as the most similar bug reports to the new bug report. Finally, the triager examines the suggested bug reports to determine whether the new bug report duplicates an existing bug report. We calibrated our approach on a subset of the Eclipse bug repository and evaluated our approach on a subset of the Firefox bug repository. The experimental results show that our approach can detect 67%-93 % of duplicate bug reports in the Firefox bug repository, compared to 43%-72% using natural language information alone.
Discovering the semantics of user keywords
- Journal on Universal Computer Science. Special Issue: Ontologies and their Applications
, 2007
"... Abstract: The technology in the field of digital media generates huge amounts of textual information every day, so mechanisms to retrieve relevant information are needed. Under these circumstances, many times current web search engines do not provide users with the information they seek, because the ..."
Abstract
-
Cited by 16 (11 self)
- Add to MetaCart
Abstract: The technology in the field of digital media generates huge amounts of textual information every day, so mechanisms to retrieve relevant information are needed. Under these circumstances, many times current web search engines do not provide users with the information they seek, because these search tools mainly use syntax based techniques. However, search engines based on semantic and context information can help overcome some of the limitations of current alternatives. In this paper, we propose a system that takes as input a list of plain keywords provided by a user and translates them into a query expressed in a formal language without ambiguity. Our system discovers the semantics of user keywords by consulting the knowledge represented by many (heterogeneous and distributed) ontologies. Then, context information is used to remove ambiguity and build the most probable query. Our experiments indicate that our system discovers the user’s information need better than traditional search engines when the semantics of the request is not the most popular on the Web.
Some Formal Analysis of Rocchio's Similarity-Based Relevance Feedback Algorithm
- Information Retrieval
, 2000
"... Rocchio's similarity-based Relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive supervised learning algorithm from examples. ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Rocchio's similarity-based Relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive supervised learning algorithm from examples.
On the Necessity of Term Dependence in a Query Space for Weighted Retrieval
- Journal of the American Society for Information Science
, 1998
"... In recent years, in the context of the vector space model, the view, held by many researchers, that documents, queries, terms, etc. are all elements of a common space has been challenged (Bollmann-Sdorra and Raghavan, 1993). In particular, it was noted that term independence has to be investigated i ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In recent years, in the context of the vector space model, the view, held by many researchers, that documents, queries, terms, etc. are all elements of a common space has been challenged (Bollmann-Sdorra and Raghavan, 1993). In particular, it was noted that term independence has to be investigated in the context of user preferences and it was shown, through counter examples, that term independence can hold in the document space, but not in the query space and vice-versa. In this paper, we continue the investigation of query and document spaces with respect to the property of term independence. We prove, under realistic assumptions, that requiring term independence to hold in the query space is inconsistent with the goal of achieving better performance by means of weighted retrieval. The result that term independence in the query space is undesirable is obtained without making any assumption about whether or not the property of term independence holds in the document space. The result...
Block matching for ontologies
- In Proc. of 5th International Semantic Web Conference
, 2006
"... Abstract. Ontology matching is a crucial task to enable interoperation between Web applications using different but related ontologies. Today, most of the ontology matching techniques are targeted to find 1:1 mappings. However, block mappings are in fact more pervasive. In this paper, we discuss the ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Abstract. Ontology matching is a crucial task to enable interoperation between Web applications using different but related ontologies. Today, most of the ontology matching techniques are targeted to find 1:1 mappings. However, block mappings are in fact more pervasive. In this paper, we discuss the block matching problem and suggest that both the mapping quality and the partitioning quality should be considered in block matching. We propose a novel partitioning-based approach to address the block matching issue. It considers both linguistic and structural characteristics of domain entities based on virtual documents, and uses a hierarchical bisection algorithm for partitioning. We set up two kinds of metrics to evaluate of the quality of block matching. The experimental results demonstrate that our approach is feasible. 1

