Results 1 -
8 of
8
Latent semantic indexing: A probabilistic analysis
, 1998
"... Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underl ..."
Abstract
-
Cited by 210 (8 self)
- Add to MetaCart
Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We also propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging experimental results. We also argue that our results may be viewed in a more general framework, as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative filtering.
The case for a portuguese web search engine
- In Proceedings of ICWI-2003, the IADIS International Conference WWW/Internet 2003
, 2003
"... This paper discusses the importance of search engines specialised in providing services to community Webs and presents the case for a search engine for the Portuguese Web community. In addition to a search service optimised for these users, there are other services that can be addressed by such syst ..."
Abstract
-
Cited by 17 (14 self)
- Add to MetaCart
This paper discusses the importance of search engines specialised in providing services to community Webs and presents the case for a search engine for the Portuguese Web community. In addition to a search service optimised for these users, there are other services that can be addressed by such system, including the preservation of publications of historical interest for future access to the community and obtaining knowledge about the preferences and interests of this society in the information age. The paper also introduces the architecture and design of tumba!, a new Internet search engine for the Portuguese Web. Tumba! uses a new repository architecture and implements innovative ranking and presentation algorithms. It is designed to take profit of our unique knowledge of this Web, making it a platform that could work as a testbed for research and evaluation of new ideas in Web information retrieval and computational processing of the Portuguese language.
An Approach to Relate the Web Communities Through Bipartite Graphs
- WISE
, 2001
"... The Web harbors a large number of community structures. Early detection of community structures has many purposes such as reliable searching and selective advertising. In this paper we investigate the problem of extracting and relating the web community structures from a large collection of Web-page ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The Web harbors a large number of community structures. Early detection of community structures has many purposes such as reliable searching and selective advertising. In this paper we investigate the problem of extracting and relating the web community structures from a large collection of Web-pages by performing hyper-link analysis. The proposed algorithm extracts the potential community signatures by extracting the corresponding dense bipartite graph (DBG) structures from the given data set of web pages. Further, the proposed algorithm can also be used to relate the extracted community signatures. We report the experimental results conducted on 10 GB TREC (Text REtrieval Conference) data collection that contains 1.7 million pages and 21.5 million links. The results demonstrate that the proposed approach extracts meaningful community signatures and relates them.
Inferring Web Communities Through Relaxed Cocitation and Dense Bipartite Graphs
- In Proc. of Data Base Engineering Workshop
, 2001
"... Community forming is one of the important activity in the Web. The Web harbors a large number of communities. A community is a group of content creators that manifests itself as a set of interlinked pages. Given a large collection of pages our aim is to find potential communities in the Web. In the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Community forming is one of the important activity in the Web. The Web harbors a large number of communities. A community is a group of content creators that manifests itself as a set of interlinked pages. Given a large collection of pages our aim is to find potential communities in the Web. In the literature, Ravi Kumar et al. [18] proposed a trawling method to find potential communities by abstracting a core of the community as a group of pages that form a complete bipartite graph (CBG) (web-page as a node and link as an edge between two nodes). The trawling approach extracts a small group of pages that form a CBG, which is a signature of a potential community.
Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition
, 2002
"... As more information becomes available on the World Wide Web (there are currently over 4 billion pages covering most areas of human endeavor), it becomes more difficult to provide effective search tools for information access. Today, people access web information through two main kinds of search inte ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
As more information becomes available on the World Wide Web (there are currently over 4 billion pages covering most areas of human endeavor), it becomes more difficult to provide effective search tools for information access. Today, people access web information through two main kinds of search interfaces: Browsers (clicking and following hyperlinks) and Query Engines (queries in the form of a set of keywords showing the topic of interest). The first process is tentative and time consuming and the second may not satisfy the user because of many inaccurate and irrelevant results. Better support is needed for expressing one's information need and returning high quality search results by web search tools. There appears to be a need for systems that do reasoning under uncertainty and are flexible enough to recover from the contradictions, inconsistencies, and irregularities that such reasoning involves.
An Approach to Find Related Communities Based on Bipartite Graphs
, 2001
"... In this paper we investigate the problem of extracting related community information from a large collection of Web-pages by performing hyperlink analysis. We consider a group of communities (say related community set) related if they have common interests on some topic. In the proposed approach, we ..."
Abstract
- Add to MetaCart
In this paper we investigate the problem of extracting related community information from a large collection of Web-pages by performing hyperlink analysis. We consider a group of communities (say related community set) related if they have common interests on some topic. In the proposed approach, we employ dense bipartite graph (DBG) abstraction for two purposes. From the given page collection, we first extract all the communities by mathematically abstracting the community as a DBG over a set of pages. Next, our approach extracts related communities among the communities by abstracting related community set as a DBG over set of communities. We report experimental results on 10 GB TREC (Text REtrieval Conference) data collection that contains 1.7 million pages and 21.5 million links. The results demonstrate that the proposed approach extracts related community structures.
A Document Clustering and Ranking System . . .
"... Objective: A major problem faced in biomedical informatics involves how best to present information retrieval results. When a single query retrieves many results, simply showing them as a long list often provides poor overview. With a goal of presenting users with reduced sets of relevant citations, ..."
Abstract
- Add to MetaCart
Objective: A major problem faced in biomedical informatics involves how best to present information retrieval results. When a single query retrieves many results, simply showing them as a long list often provides poor overview. With a goal of presenting users with reduced sets of relevant citations, this study developed an approach that retrieved and organized MEDLINE citations into different topical groups and prioritized important citations in each group. Design: A text mining system framework for automatic document clustering and ranking organized MEDLINE citations following simple PubMed queries. The system grouped the retrieved citations, ranked the citations in each cluster, and generated a set of keywords and MeSH terms to describe the common theme of each cluster. Measurements: Several possible ranking functions were compared, including citation count per year (CCPY), citation count (CC), and journal impact factor (JIF). We evaluated this framework by identifying as “important ” those articles selected by the Surgical Oncology Society. Results: Our results showed that CCPY outperforms CC and JIF, i.e., CCPY better ranked important articles than did the others. Furthermore, our text clustering and knowledge extraction strategy grouped the retrieval results into informative clusters as revealed by the keywords and MeSH terms extracted from the documents in each cluster. Conclusions: The text mining system studied effectively integrated text clustering, text summarization, and text ranking and organized MEDLINE retrieval results into different topical groups.
The Pleasures And Perils Of Social Software
"... This paper describes some theoretical underpinnings for the use of social software (such as wikis, blogs, tagged link sharing) in e-learning, in particular extending earlier models of interaction, and highlighting the emergent nature of the group as a distinct entity. It goes on to describe some of ..."
Abstract
- Add to MetaCart
This paper describes some theoretical underpinnings for the use of social software (such as wikis, blogs, tagged link sharing) in e-learning, in particular extending earlier models of interaction, and highlighting the emergent nature of the group as a distinct entity. It goes on to describe some of the uses of social software in undergraduate and postgraduate teaching, examining the consequent benefits and dangers.

