(Enter summary)
Abstract: A large amount of information on the Web is contained in
regularly structured objects, which we call data records. Such
data records are important because they often present the essential
information of their host pages, e.g., lists of products or services.
It is useful to mine such data records in order to extract
information from them to provide value-added services. Existing
automatic techniques are not satisfactory because of their poor
accuracies. In this paper, we propose a more... (Update)
Cited by: More
Classification Techniques for Categorization of Hypertext Documents - Arumugam
(Correct)
Active bibliography (related documents): More All
1.1: Mining Data Records in Web Pages - Liu, Grossman, Zhai (2003)
(Correct)
0.3: Agents Need to Become Welcome - Magnin, Snoussi, Pham, Dury, Nie (2002)
(Correct)
0.1: LiveClassifier: Creating Hierarchical Text Classifiers.. - Huang, Chuang, Chien (2004)
(Correct)
Similar documents based on text: More All
0.3: Building Text Classifiers Using Positive and Unlabeled Examples - Liu, Dai, Li, al. (2003)
(Correct)
0.2: A Refinement Approach to Handling Model Misfit in Text.. - Wu, Phang, Liu, Li
(Correct)
0.2: Learning with Positive and Unlabeled Examples Using Weighted.. - Lee, Liu (2003)
(Correct)
Related documents from co-citation: More All
1519: On integrating catalogs
- Agrawal, Srikant - 2001
1426: Nearest Neighbor Algorithm for Text Categorization (context) - Baoli, Shiwen et al. - 2003
1271: Mining the Web: Discovering knowledge from hypertext data (context) - Chakrabarti - 2003
BibTeX entry: (Update)
Liu, B., Grossman, R. and Zhai, Y. Mining data records in Web pages. UIC Technical Report, 2003. http://citeseer.ist.psu.edu/article/liu03mining.html More
@misc{ liu03mining,
author = "B. Liu and R. Grossman and Y. Zhai",
title = "Mining data records in Web pages",
text = "Liu, B., Grossman, R. and Zhai, Y. Mining data records in Web pages. UIC
Technical Report, 2003.",
year = "2003",
url = "citeseer.ist.psu.edu/article/liu03mining.html" }
Citations (may not include all citations):
171
A scalable comparison shopping agent for the World Wide Web
- Doorenbos, Etzioni et al. - 1997
62
A hierarchical approach to wrapper induction
- Muslea, Minton et al. - 1999
48
Generating finite-state transducers for semi-structured data..
- Hsu, Dung - 1998
44
Wrapper induction: efficiency and expressiveness
- Kushmerick - 2000
40
Record-boundary discovery in Web documents
- Embley, Jiang et al. - 1999
39
Algorithms on strings (context) - Gusfield - 1997
36
Mining data records in Web pages
- Liu, Grossman et al. - 2003
19
A flexible learning system for wrapping tables and lists in ..
- Cohen, Hurst et al. - 2002
8
Automatic data extraction from lists and tables in web sourc..
- Lerman, Knoblock et al. - 2001
3
A fully automated extraction system for the World Wide Web (context) - Buttler, Liu et al. - 2001
2
Algorithms for string matching: A survey (context) - Baeza-Yates - 1989
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cs.uic.edu/~liub/publications/papers_chron.html): More
Building Text Classifiers Using Positive and Unlabeled Examples - Liu, Dai, Li, al. (2003)
(Correct)
Learning with Positive and Unlabeled Examples Using Weighted.. - Lee, Liu (2003)
(Correct)
Eliminating Noisy Information in Web Pages for Data Mining - Yi, al. (2003)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC