by Fabien David Bouskila, Ecole Nationale, Superieure Telecommunications Paris
In Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI 2000), Las Vegas
http://www.cse.lehigh.edu/~billp/pubs/BouskilaMSThesis.pdf
Add To MetaCart
Abstract:
The global growth in popularity of the World Wide Web has been enabled in part by the availability of browser-based search tools, which in turn have led to an increased demand for indexing techniques and technologies. This explosive growth is evidenced by the rapid expansion in the number and size of digital collections of documents. Simulta-neously, fully automatic content-based techniques of indexing have been under develop-ment at a variety of institutions. The time is thus ripe for the development of scalable knowledge management systems capable of handling extremely large textual collections distributed across multiple repositories. Hierarchical Distributed Dynamic Indexing (HDDI) dynamically creates a hierarchical index from distributed document collections. At each node of the hierarchy, a knowl-edge base is created and subtopic regions of semantic locality are identi ed. This thesis presents an overview of HDDI with a focus on the algorithm that identi es regions of semantic locality within knowledge bases at each level of the hierarchy. iii To my parents, my brother Gautier, my sister Elise, Bertrand and Nathalie. iv ACKNOWLEDGMENTS I would like to thank Professor William Morton Pottenger for his proactive manag-ing of the HDDI project, and for his open-door policy towards students. I gratefully acknowledge the assistance and contributions of the sta in the Automated Learning Group directed by Dr. Michael Welge at the National Center for Supercomputing Ap-plications (NCSA) as well as the funding and technical oversight provided by Dr. Tilt Thompkins in the Emerging Technologies Group at NCSA. I also want tothankmy the-
Citations
|
2217
|
Introduction to Modern Information Retrieval
– Salton, McGill
- 1983
|
|
2005
|
The Design and Analysis of Computer Algorithms
– Aho, Hopcroft, et al.
- 1974
|
|
377
|
Using linear algebra for intelligent information retrieval
– Berry, Dumais, et al.
- 1995
|
|
160
|
The chaco user’s guide – version 2.0
– Hendrickson, Leland
- 1994
|
|
153
|
Automatic Word Sense Discrimination
– Schütze
- 1998
|
|
148
|
Information storage and retrieval
– Korfhage
- 1997
|
|
140
|
B.Pottenger,L.Rauchwerger,andP.Tu.Parallel programming with polaris
– Blume, Doallo, et al.
- 1996
|
|
129
|
Rijsbergen, Information Retrieval
– Van
- 1979
|
|
124
|
On relevance, probabilistic indexing and information retrieval
– Maron, Kuhns
- 1960
|
|
96
|
Automatic construction of networks of concepts characterizing document databases
– Chen, K
- 1992
|
|
44
|
Mathematical Taxonomy
– Jardine, Sibson
- 1971
|
|
32
|
Depth rst search and linear graph algorithms
– Tarjan
- 1972
|
|
29
|
Automatic structuring of knowledge bases by conceptual clustering
– Mineau, Godin
- 1995
|
|
23
|
Document retrieval systems — optimization and evaluation
– Rocchio
- 1966
|
|
21
|
Algorithm AS 136: A K-means clustering algorithm
– Hartigan, Wong
- 1979
|
|
17
|
Experiments in Solving Recurrences in Computer Programs
– Theory
- 1997
|
|
15
|
Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems
– Cleverdon
- 1962
|
|
14
|
On the inverse relationship of recall and precision
– Cleverdon
- 1972
|
|
13
|
Clustering in a high-dimensional space using hypergraph models
– Han, Karypis, et al.
- 1997
|
|
12
|
Automatic text analysis
– Salton
- 1970
|
|
11
|
The association factor in information retrieval
– Stiles
- 1961
|
|
10
|
Interoperability, scaling, and the digital libraries research agenda http:// www.hpcc.gov/reports/reports-nco/iita-dlw/main.html
– Lynch, Garcia-Molina
- 1995
|
|
4
|
Bayesian classi cation
– Cheeseman, Self, et al.
- 1988
|
|
2
|
National Laboratory, “arXiv.org ePrint archive”. http://xxx.lanl.gov
– Alamos
- 1999
|
|
2
|
Automatic Keyword Classi cation for Information Retrieval
– Jones
- 1971
|
|
2
|
Statistical Association Methods for Mechanized Documentation. Washington DC: National Bureau of Standards
– Stevens, Guiliano, et al.
- 1964
|
|
2
|
Speculations concerning information retrieval
– Good
- 1958
|
|
1
|
A basis for time and cost evaluation in information systems," The Information Bazar
– Korfhage, DeLutis
- 1969
|
|
1
|
Dynamic Information and Library Processing. Englewood Cli s
– Salton
- 1975
|
|
1
|
A statistical approach tomechanised encoding and searching of library information
– Luhn
- 1957
|
|
1
|
Fairhorne, The Mathematics of Classi cation
– A
- 1961
|
|
1
|
Is automatic classi cation a reasonable application of statistical analysis of text
– Doyle
- 1965
|
|
1
|
Retrieval and relevance: On the evaluation of IR systems." The ISI Lazerow Lecture
– Robertson
- 1993
|
|
1
|
Multilevel k-way hypergraph partioning
– Karypis, Kumar
- 1998
|
|
1
|
Texture segmentation of SAR images
– Wang
- 1997
|
|
1
|
Automatic graph clustering (system demonstration
– Sablowski, Frick
- 1996
|
|
1
|
Intel Technology Journal, Volume 7, Issue 4, 2003 Manohar Castelino is a senior network software engineer at Intel in the Network Processor Division. He received a B.E. degree from KREC India and has worked primarily in the areas of networking and network
– unknown authors
- 1998
|