(Enter summary)
Abstract: The World-Wide Web provides every internet citizen with access to an
abundance of information, but it becomes increasingly difficult to identify the
relevant pieces of information. Research in web mining tries to address this
problem by applying techniques from data mining and machine learning to
Web data and documents. This chapter provides a brief overview of web
mining techniques and research areas, most notably hypertext classification,
wrapper induction, recommender systems and web usage... (Update)
Cited by: More
Conceptual Knowledge Processing with Google - Koester (2005)
(Correct)
Similar documents (at the sentence level):
24.7%: Web Structure Mining Exploiting the Graph Structure of the.. - Fürnkranz (2002)
(Correct)
5.3%: Hyperlink Ensembles: A Case Study in Hypertext Classification - Fürnkranz (2001)
(Correct)
Active bibliography (related documents): More All
0.6: WebMate: A Personal Agent for Browsing and Searching - Chen, Sycara (1998)
(Correct)
0.5: Personalising On-Line Information Retrieval Support with a Genetic.. - Er (1996)
(Correct)
0.5: Using Context to Assist in Personal File Retrieval - Soules (2006)
(Correct)
Similar documents based on text: More All
0.1: ParaMEME: A Parallel Implementation and a Web Interface for a.. - Grundy, al. (1996)
(Correct)
0.1: Integrating External Information Sources to Guide Worldwide.. - Monge, Elkan (1995)
(Correct)
0.1: The WEBFIND tool for finding scientific papers over the.. - Monge, Elkan (1996)
(Correct)
BibTeX entry: (Update)
Johannes F urnkranz. Web mining. In Oded Maimon and Lior Rokach, editors, The Data Mining and Knowledge Discovery Handbook, pages 899-- 920. Springer, 2005. http://citeseer.ist.psu.edu/urnkranz04web.html More
@misc{ urnkranz05web,
author = "J. urnkranz",
title = "Web mining",
text = "Johannes F urnkranz. Web mining. In Oded Maimon and Lior Rokach, editors,
The Data Mining and Knowledge Discovery Handbook, pages 899-- 920. Springer,
2005.",
year = "2005",
url = "citeseer.ist.psu.edu/urnkranz04web.html" }
Citations (may not include all citations):
641
The anatomy of a large-scale hypertextual Web search engine
- Brin, Page - 1998
576
Authoritative sources in a hyperlinked environment
- Kleinberg - 1999
568
Indexing by latent semantic analysis
- Deerwester, Dumais et al. - 1990
492
Learning logical definitions from relations (context) - Quinlan - 1990
463
Term-weighting approaches in automatic text retrieval (context) - Salton, Buckley - 1988
432
Automatic Text Processing: The Transformation (context) - Salton - 1989
404
Agents that reduce work and information overload (context) - Maes - 1994
375
On power-law relationships of the internet topology
- Faloutsos, Faloutsos et al. - 1999
318
Scientific American (context) - Berners-Lee, Hendler et al. - 2001
225
NewsWeeder: Learning to filter netnews
- Lang - 1995
215
A comparative study on feature selection in text categorizat..
- Yang, Pedersen - 1997
207
WebWatcher: A learning apprentice for the world wide web
- Armstrong, Freitag et al. - 1995
188
Empirical analysis of predictive algorithms for collaborativ..
- Breese, Heckerman et al. - 1998
178
A softbot-based interface to the internet
- Etzioni, Weld - 1994
171
A scalable comparison-shopping agent for the World-Wide Web
- Doorenbos, Etzioni et al. - 1997
164
Webert: Identifying interesting web sites (context) - Pazzani, Muramatsu et al.
163
Improved algorithms for topic distillation in a hyperlinked ..
- Bharat, Henzinger - 1998
155
Grouplens: Applying collaborative filtering to usenet news
- Konstan, Miller et al. - 1997
154
Automatic resource compilation by analyzing hyperlink struct..
- Chakrabarti, Dom et al. - 1998
140
Graph structure in the Web (context) - Broder, Kumar et al. - 2000
140
A comparison of event models for naive bayes text classifica..
- McCallum, Nigam - 1998
139
Machine learning in automated text categorization
- Sebastiani - 2002
132
Data preparation for mining world wide web browsing patterns
- Cooley, Mobasher et al. - 1999
129
Searching the world wide web
- Lawrence, Giles - 1998
126
Diameter of the world-wide web (context) - Albert, Jeong et al. - 1999
124
Learning information retrieval agents: Experiments with auto..
- Balabanovic, Shoham - 1995
123
A vector space model for automatic indexing (context) - Salton, Wong et al. - 1975
114
Learning interface agents (context) - Kozierok, Maes - 1993
111
Collaborative interface agents
- Lashkari, Metral et al. - 1994
105
Learning information extraction rules for semi-structured an..
- Soderland - 1999
90
Enhanced hypertext categorization using hyperlinks
- Chakrabarti, Dom et al. - 1998
90
Ensemble methods in machine learning
- Dietterich - 2000
87
Ontologies: Silver Bullet for Knowledge Management and Elect..
- Fensel - 2001
85
Web usage mining: Discovery and applications of usage patter..
- Srivastava, Cooley et al. - 2000
82
Finding related pages in the World Wide Web
- Dean, Henzinger - 1999
81
Learning to construct knowledge bases from the World Wide We..
- Craven, DiPasquo et al. - 2000
77
Evolving agents for personalized information filtering (context) - Sheth, Maes - 1993
75
ParaSite: Mining structural information on the Web (context) - Spertus - 1997
73
Information extraction from HTML: Application of a general m..
- Freitag - 1998
73
An evaluation of phrasal and clustered representations on a .. (context) - Lewis - 1992
68
A technique for measuring the relative size and overlap of p.. (context) - Bharat, Broder - 1998
66
Learning rules that classify e-mail
- Cohen - 1996
65
GENVL and WWWW: Tools for taming the Web
- McBryan - 1994
64
Automatically generating extraction patterns from untagged t..
- Riloff - 1996
62
Automatic personalization based on web usage mining
- Mobasher, Cooley et al. - 2000
57
Optimizing search engines using clickthrough data
- Joachims - 2002
52
The connectivity server: Fast access to linkage information .. (context) - Bharat, Broder et al. - 1998
48
Generating finite-state transducers for semistructured data ..
- Hsu, Dung - 1998
47
Towards adaptive web sites: Conceptual framework and case st..
- Perkowitz, Etzioni - 2000
47
Item-based collaborative filtering recommendation algorithms
- Sarwar, Karypis et al. - 2001
45
Moving up the information food chain: Deploying softbots on ..
- Etzioni
44
Wrapper induction: Efficiency and expressiveness
- Kushmerick - 2000
41
Determinate literals in inductive logic programming (context) - Quinlan - 1991
38
Web mining research: A survey
- Kosala, Blockeel - 2000
35
Latent class models for collaborative filtering (context) - Hofmann, Puzicha - 1999
33
An empirical study of automated dictionary construction for ..
- Riloff - 1996
31
Clustering methods for collaborative filtering
- Ungar, Foster - 1998
31
Interface agents that learn: An investigation of learning is..
- Payne, Edwards - 1997
29
A study of approaches to hypertext categorization
- Yang, Slattery et al. - 2002
28
Learning relations by pathfinding
- Richards, Mooney - 1992
24
First-order learning for Web mining
- Craven, Slattery et al. - 1998
24
Data mining for hypertext: A tutorial survey
- Chakrabarti - 2000
22
Probabilistic models for unified collaborative and content-b..
- Popescul, Ungar et al. - 2001
21
Relational learning with statistical predicate invention: Be..
- Craven, Slattery - 2001
20
Knowledge-based navigation of complex information spaces
- Burke, Hammond et al. - 1996
17
Discovery and evaluation of aggregate usage profiles for web..
- Mobasher, Dai et al. - 2002
16
A case study in using linguistic phrases for text categoriza.. (context) - Frnkranz, Mitchell et al. - 1998
16
A practical hypertext categorization method using links and .. (context) - Oh, Myaeng et al. - 2000
16
Special issue on recommender systems (context) - Resnick, Varian - 1997
15
Towards semantic web mining
- Berendt, Hotho et al. - 2002
15
Discovering test set regularities in relational domains
- Slattery, Mitchell - 2000
15
Knowledge portals --- ontologies at work
- Staab, Maedche - 2001
14
Information extraction from world wide web -- a survey
- Eikvil - 1999
14
Better bayesian filtering (context) - Graham - 2003
14
A unifying approach to HTML wrapper representation and learn..
- Grieser, Jantke et al. - 2000
14
Feature subset selection in text-learning (context) - Mladeni - 1998
14
Efficient adaptive-support association rule mining for recom..
- Lin, Alvarez et al. - 2002
13
Electronic commerce recommender applications (context) - Schafer, Konstan et al. - 2000
12
Discovery of web robot sessions based on their navigational ..
- Tan, Kumar - 2002
10
Wrapper maintenance: A machine learning approach
- Lerman, Minton et al. - 2003
10
Content-boosted collaborative filtering for improved recomme..
- Melville, Mooney et al. - 2002
8
Ontology learning part one --- on discovering taxonomic rela..
- Maedche, Pekar et al. - 2003
8
Knowledge and Information Systems (context) - Levene, Borges et al. - 2001
8
Feature engineering for text classification
- Scott, Matwin - 1999
7
Bottom-up relational learning of pattern matching rules for ..
- Califf - 2003
7
Web-collaborative filtering: Recommending music by crawling ..
- Cohen, Fan - 2000
6
The Lixto data extraction project --- back and forth between..
- Gottlob, Koch et al. - 2004
6
Learning ontologies for the semantic web
- Maedche, Staab - 2001
6
Department of Intelligent Systems (context) - Mladeni, WebWatcher et al. - 1996
6
Effective web data extraction with standard XML technologies
- Myllymaki - 2001
5
Turning Yahoo into an automatic web-page classifier (context) - Mladeni - 1998
5
Using site semantics to analyze (context) - Berendt - 2002
5
Word sequences as features in text learning (context) - Mladeni, Grobelnik - 1998
4
Technical paper recommendation: A study in combining multipl..
- Basu, Hirsh et al. - 2001
4
Learning to filter unsolicited commercial e-mail (context) - Androutsopoulos, Paliouras et al. - 2004
4
Web usage mining as a tool for personalization: A survey (context) - Pierrakos, Paliouras et al. - 2003
4
Relational Data Mining: Inductive Logic Programming for Know.. (context) - Dzeroski, Lavra - 2001
3
IEMS -- the intelligent email sorter (context) - Crawford, Kay et al. - 2002
3
The laborious way from data mining to web log mining
- Spiliopoulou - 1999
3
Mining the World Wide Web: An Information Search Approach (context) - Chang, Healy et al. - 2001
2
Communications of the ACM (context) - Berners-Lee, Cailliau et al. - 1994
2
Using collaborative filtering to weave and information tapes.. (context) - Goldberg, Nichols et al. - 1992
2
Hybrid hill-climbing and knowledge-based methods for intelli.. (context) - Mock - 1996
2
Mining the Web: Analysis of Hypertext and Semi Structured Da.. (context) - Chakrabarti - 2002
1
Text-learning and related intelligent agents: A survey (context) - Mladeni - 1999
1
spam filtering: A challenge problem for data mining (context) - Fawcett, vivo - 2003
1
Austrian Research Institute for Artificial Intelligence (context) - Frnkranz, using et al. - 1998
1
Hyperlink ensembles: A case study in hypertext classificatio.. (context) - Frnkranz - 2002
1
Frequently-asked question files: Experiences with the FAQ fi.. (context) - Burke, Hammond et al. - 1997
1
User profiling for the Melvil knowledge retrieval system (context) - Frnkranz, Holzbaur et al. - 2002
1
Wiemer-Hastings (context) - Staab, Maedche et al. - 2000
1
Information Extraction in the Web Era: Natural Language Comm.. (context) - Pazienza - 2003
1
Learning to match ontologies (context) - Doan, Madhavan et al. - 2003
1
Machine Learning for Information Extraction: Proceedings of .. (context) - Califf - 1999
1
Email answering assistance by semi-supervised text classific..
- Scheffer - 2004
Documents on the same site (http://faure.isti.cnr.it/~fabrizio/CP2(Google)-pdf.html): More
Deliverable Identification Sheet - Project Ref No
(Correct)
How Weak Text Categorizers Can Strengthen Performance.. - Uren, Addis (2001)
(Correct)
Low level information extraction: a Bayesian network based.. - Bouckaert
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC