Results 1 - 10
of
29,480
Web Document Clustering: A Feasibility Demonstration
, 1998
"... Abstract Users of Web search engines are often forced to sift through the long ordered list of document “snippets” returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on the major s ..."
Abstract
-
Cited by 435 (3 self)
- Add to MetaCart
Abstract Users of Web search engines are often forced to sift through the long ordered list of document “snippets” returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on the major
Web documents
, 2007
"... This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or sel ..."
Abstract
- Add to MetaCart
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit:
Web documents
, 2002
"... Developing a specialized directory system by automatically classifying ..."
Record-Boundary Discovery In Web Documents
, 1998
"... Extraction of information from unstructured or semistructured Web documents often requires a recognition and delimitation of records. (By "record" we mean a group of information relevant to some entity.) Without first chunking documents that contain multiple records according to record bou ..."
Abstract
-
Cited by 127 (20 self)
- Add to MetaCart
Extraction of information from unstructured or semistructured Web documents often requires a recognition and delimitation of records. (By "record" we mean a group of information relevant to some entity.) Without first chunking documents that contain multiple records according to record
Characterizing Web Document Change
- In Advances in Web-Age Information Management, 2nd Intl. Conf., WAIM 2001
, 2001
"... The World Wide Web is growing and changing at an astonishing rate. For the information in the web to be useful, web information systems such as search engines have to keep up with the growth and change of the web. In this paper we study how web documents change. In particular, we study two import ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
The World Wide Web is growing and changing at an astonishing rate. For the information in the web to be useful, web information systems such as search engines have to keep up with the growth and change of the web. In this paper we study how web documents change. In particular, we study two
Embedding Knowledge in Web Documents
, 1999
"... The paper argues for the use of general and intuitive knowledge representation languages (and simpler notational variants, e.g. subsets of natural languages) for indexing the content of Web documents and representing knowledge within them. We believe that these languages have advantages over metadat ..."
Abstract
-
Cited by 59 (9 self)
- Add to MetaCart
The paper argues for the use of general and intuitive knowledge representation languages (and simpler notational variants, e.g. subsets of natural languages) for indexing the content of Web documents and representing knowledge within them. We believe that these languages have advantages over
Web Document Modelling and Clustering
- CMMIS Workshop in the Int’l Conf. On EntityRelationship Approaches
, 1997
"... Great number of the web documents, created by people of a great variety of walks and used by all the people being able to access to the Internet, gives rise to a problem of how to search the Internet to easily obtain what users want and to filter out what they don't. The problem is strongly rel ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Great number of the web documents, created by people of a great variety of walks and used by all the people being able to access to the Internet, gives rise to a problem of how to search the Internet to easily obtain what users want and to filter out what they don't. The problem is strongly
Web Document Outliner
"... We present a HTML outliner that allows users to toggle between Web documents and their outlines. We describe three versions of the outliner developed using JavaScript. The method is distinguished by its simplicity and consequent ease of use by Web document authors. INTRODUCTION An outline presents ..."
Abstract
- Add to MetaCart
We present a HTML outliner that allows users to toggle between Web documents and their outlines. We describe three versions of the outliner developed using JavaScript. The method is distinguished by its simplicity and consequent ease of use by Web document authors. INTRODUCTION An outline
Focused crawling: a new approach to topic-specific Web resource discovery
, 1999
"... The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevan ..."
Abstract
-
Cited by 637 (10 self)
- Add to MetaCart
that are relevant to a pre-defined set of topics. The topics are specified not using keywords, but using exemplary documents. Rather than collecting and indexing all accessible Web documents to be able to answer all possible ad-hoc queries, a focused crawler analyzes its crawl boundary to find the links
Genre classification of web documents
- In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05), Poster
, 2005
"... Retrieving relevant documents over the Web is an over-whelming task when search engines return thousands of Web documents. Sifting through these documents is time-consuming and sometimes leads to an unsuccess-ful search. One problem is that most search engines rely on matching a query to documents b ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Retrieving relevant documents over the Web is an over-whelming task when search engines return thousands of Web documents. Sifting through these documents is time-consuming and sometimes leads to an unsuccess-ful search. One problem is that most search engines rely on matching a query to documents
Results 1 - 10
of
29,480