See this document in CiteSeerX!

Mining Data Records in Web Pages (2003)  (Make Corrections)  (36 citations)
Bing Liu, Robert Grossman, Y. Zhai



  Home/Search   Context   Related

 
View or download:
uic.edu/~liub/publ...003dataRecord.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  uic.edu/~liub/publ...papers_chron (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such data records are important because they often present the essential information of their host pages, e.g., lists of products or services. It is useful to mine such data records in order to extract information from them to provide value-added services. Existing automatic techniques are not satisfactory because of their poor accuracies. In this paper, we propose a more... (Update)

Cited by:   More
Classification Techniques for Categorization of Hypertext Documents - Arumugam   (Correct)

Active bibliography (related documents):   More   All
1.1:   Mining Data Records in Web Pages - Liu, Grossman, Zhai (2003)   (Correct)
0.3:   Agents Need to Become Welcome - Magnin, Snoussi, Pham, Dury, Nie (2002)   (Correct)
0.1:   LiveClassifier: Creating Hierarchical Text Classifiers.. - Huang, Chuang, Chien (2004)   (Correct)

Similar documents based on text:   More   All
0.3:   Building Text Classifiers Using Positive and Unlabeled Examples - Liu, Dai, Li, al. (2003)   (Correct)
0.2:   A Refinement Approach to Handling Model Misfit in Text.. - Wu, Phang, Liu, Li   (Correct)
0.2:   Learning with Positive and Unlabeled Examples Using Weighted.. - Lee, Liu (2003)   (Correct)

Related documents from co-citation:   More   All
1519:   On integrating catalogs - Agrawal, Srikant - 2001
1426:   Nearest Neighbor Algorithm for Text Categorization (context) - Baoli, Shiwen et al. - 2003
1271:   Mining the Web: Discovering knowledge from hypertext data (context) - Chakrabarti - 2003

BibTeX entry:   (Update)

Liu, B., Grossman, R. and Zhai, Y. Mining data records in Web pages. UIC Technical Report, 2003. http://citeseer.ist.psu.edu/article/liu03mining.html   More

@misc{ liu03mining,
  author = "B. Liu and R. Grossman and Y. Zhai",
  title = "Mining data records in Web pages",
  text = "Liu, B., Grossman, R. and Zhai, Y. Mining data records in Web pages. UIC
    Technical Report, 2003.",
  year = "2003",
  url = "citeseer.ist.psu.edu/article/liu03mining.html" }
Citations (may not include all citations):
171   A scalable comparison shopping agent for the World Wide Web - Doorenbos, Etzioni et al. - 1997
62   A hierarchical approach to wrapper induction - Muslea, Minton et al. - 1999
48   Generating finite-state transducers for semi-structured data.. - Hsu, Dung - 1998
44   Wrapper induction: efficiency and expressiveness - Kushmerick - 2000
40   Record-boundary discovery in Web documents - Embley, Jiang et al. - 1999
39   Algorithms on strings (context) - Gusfield - 1997
36   Mining data records in Web pages - Liu, Grossman et al. - 2003
19   A flexible learning system for wrapping tables and lists in .. - Cohen, Hurst et al. - 2002
8   Automatic data extraction from lists and tables in web sourc.. - Lerman, Knoblock et al. - 2001
3   A fully automated extraction system for the World Wide Web (context) - Buttler, Liu et al. - 2001
2   Algorithms for string matching: A survey (context) - Baeza-Yates - 1989



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cs.uic.edu/~liub/publications/papers_chron.html):   More
Building Text Classifiers Using Positive and Unlabeled Examples - Liu, Dai, Li, al. (2003)   (Correct)
Learning with Positive and Unlabeled Examples Using Weighted.. - Lee, Liu (2003)   (Correct)
Eliminating Noisy Information in Web Pages for Data Mining - Yi, al. (2003)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC