| G. Salton: "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer," Addison Wesley, 1989. |
....storage of formatted and unformatted data, such as text, voice and image in the same database. For simplicity, an instance of any kind of data will be referred to as a record in the rest of this paper. Efficient file structures and search techniques must be developed for such multimedia databases [1, 7]. Signature files provide a space efficient fast search structure by searching the signatures, instead of searching the actual records. A record signature is a bit string reflecting the essence of the record attributes. Insertions and updates in signature files require less time compared to ....
G. Salton, Automatic Text Processing: The Transformation Analysis, and Retrieval of Information by Computer (Addison-Wesley, Reading, MA. 1989)
....Agent (PVA) that can automatically organize a personal view by learning the user s interests, and adapt the personal view to the user s changing interests. There are three important features in PVA: 1) PVA learns user profile without intervening in the user s browsing, such as relevance feedback[13]. Although relevance feedback, which asks users to manually rate pages may make sense, it is troublesome to users and seldom done. In PVA, we use a proxy to collect the training data without user intervention. 2) A user may have interests in multiple domains. PVA, like the previous works[2] 14] ....
....users to manually rate pages may make sense, it is troublesome to users and seldom done. In PVA, we use a proxy to collect the training data without user intervention. 2) A user may have interests in multiple domains. PVA, like the previous works[2] 14] models each domain in vector space model[13]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy ....
[Article contains additional citation context not shown here]
Gerard Salton, "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer", Addison Wesley, 1989.
....as articles or technical papers, research reports contain explicitly the keywords. These explicit keywords are treated with priority. By experience, we state that they are potential research topics. Documents without explicit keywords are processed with the document representation (bag of words) Salton 89] By using our categorization model, we can extract the important words in the bag of words. Some rules are created during the TABLE I CATEGORIZATION OF RESEARCH TOPICS Computer Science Artificial Intelligence Machine Learning Machine Learning Knowledge Discovery . Data ....
Salton, G. : "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer;", Addison-Wesley, 1989.
....page into content blocks, features of each block are simultaneously extracted. In this paper, features correspond to meaningful keywords. Stop words are not included. Applying the Porter stemming algorithm [15] and removing stop words in the stop list, English keywords (features) can be extracted [16]. Extracting keyword features written in oriental languages seems more difficult because of a lack of trivial separators specified in these languages. However, many studies have applied statistical approaches to extracting keywords of oriental languages [7] In our lab, we developed an algorithm ....
....of a feature is estimated according to the weight distribution of features appearing in a page cluster. For easy calculation of each feature s entropy, features of content blocks in a page can be grouped and represented as a feature document list with term frequency (TF) or weight (such as TFxIDF [16] or its variations [18] Considering all pages in a cluster, these lists of pages form the feature document matrix (F D Matrix) The F D matrix can be generated while extracting features of documents in the cluster with the time complexity O( D F log F ) where F is the average number of ....
Salton, G., "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer", Addison Wesley, 1989.
....based on the class taxonomy in the context of document classification. The work in [20] discusses document classification without using hierarchical classification. Bayesian network as a model for data mining has been studied in [6, 5, 10, 7] Feature selections are discussed in some work [11, 16]. The general method is to define a measure first and then search for a subset of features that optimize this measure. Fisher Index method in [4] also follows this line, it does so however in a localized manner, i.e. one term at a time. Although this local method has the weakness of not ....
G. Salton, "Automatic text processing, the transformation analysis and retrieval of information by computer", Addison - Wesley, 1989.
....such a simple method is to quantitatively measure how much of improvement is obtained by using statiscal learning compared to a non learning approach. The conventional vector space model is used for representing documents and category names (each name is treated as a bag of words) and the SMART [30] system is used as the search engine. 2.1.2.3 k Nearest Neighbor Given an arbitrary input document, the system ranks its nearest neighbors among the training documents, and uses the categories of k top ranking neighbors to predict the categories of the input document. There are two main methods ....
....are used for category ranking. The similarity value between two instances is the distance between them based on a distance metric. In general, the Eucladian Distance Metric is the most commonly used. 2.1.2. 4 k Nearest Neighbor Feature Projection k NNFP technique is a variant of k NN method [30]. The most important characteristic of k NNFP technique is that the training instances are stored as their projections on each feature dimension and distance between two instances is calculated according a single feature. This allows the classi cation of a new instance to be made much faster than ....
[Article contains additional citation context not shown here]
Salton, G., Automatic Text Processing: The Transformation Analysis and Retrieval of Information by Computer, Addison-Wesley, Reading, Pennsylvania, 1989.
....or subsystems that captures, processes, stores, analyses, condenses, and disseminates information in various forms. Traditionally, information systems are text oriented which provide reports, documents, and decision making information for all levels of the hierarchy within an organization [11]. It is characterized by a text in text out mode of operation, focusing primarily on structured fields and free text. However, this style of IS becomes obsolete since information is no longer text based, but instead it is based on a combination of text, audio, video, image together with the ....
....and easily accessible. The indexing strategies for conventional information systems are well studied, many architectures of popular indices like dictionaries, catalogs, wordlists, etc. are explored and evaluated. The strategies for search procedures based on such indices are explained in details [11]. This is not the case for multimedia, visual or some other contemporary information systems. Methods of search and retrieval for multimedia information, as well as ones for spatial reasoning are just in their very development. The presence of multimedia platforms in IS, equipped with audio and ....
Salton G. (1988). Automatic Text Processing -- The Transformation Analysis, and Retrieval of Information by Computer. Addison-Wesley Publishing Co., Reading, MA.
....users. Terms 1 are extracted from a document, stemmed, stored and indexed in database or other storage systems by applying indexing approach [12] The user query is usually represented by a sequence of terms that are matched with the indexed terms based on TF IDF algorithm or similar algorithm [9] to retrieve relevant documents. In order to distinguish the relevance of the documents to the query, the retrieved documents are presented in a ranked list. 1 In the paper, term is considered as the word extracted from a document. In this paper, keyword is representative of a concept, ....
....word based information retrieval (IR) systems are efficient in handling a large document base. However, documents collected from the Internet are extremely numerous. In such case, for a query with two words 2 submitted to a search engine implemented with similarity based algorithms [8, 9], thousands of documents are probably retrieved. For the query example education and university , there are 87,368,493 hits by AltaVista, 7,379,086 hits by Infoseek, 237,902 hits by WebCrawler and 2,879 hits by Yahoo. Ranking a large number of documents using very few words is not likely to ....
[Article contains additional citation context not shown here]
G. Salton, "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer", Addison Wesley, 1989.
....the application of the database. Querying and browsing are different ways to make use of the results from the indexing efforts, and thus are equally dependent of the model of the video data. Meta data models are used in many parts of computer science, such as databases [32] information retrieval [28], knowledge representation [31] and image processing [8, 11] Models from different domains tend to focus on different types of information and solve different aspects of the problem of modelling and retrieving information. Considering the different types of data in a shared video database as ....
G. Salton. Automatic Text Processing - The Transformation Analysis, and Retrieval of Information by Computer. Addison-Wesley Publishing Company, 1988.
....of this paper. However, storing and searching unformatted data in tables of a relational database system is inefficient. Therefore, efficient data structures and search techniques must be developed for purely or partially unformatted database records [Aktug and Can 1993, Can 1993, Faloutsos 1988, Salton 1989, Van Rijsbergen 1979] For search and retrieval purposes unformatted data is described by a set of descriptors (attributes) Douglas and Stephanie 1989, Rabitti and Savino 1991, Salton 1975, Salton and Buckley 1988] For example, a document can be described by the words used in the text. These ....
SALTON, G. 1989. Automatic Text Processing: The Transformation Analysis, and Retrieval of Information by Computer. Addison Wesley, Reading, MA.
....data already exist in forms that cannot easily be migrated into a new DBMS. Content based retrieval of data is highly type specific. Years of research have produced a solid technology base for content based retrieval of documents through the use of various text indexing and search techniques 33 [22]. Similarly, simple spatial searches are well supported by today s geographic information systems ( 29] 30] e.g. Image content based retrieval of visual data is still in its infancy. Although a few specialized commercial applications exist (such as fingerprint matching systems) most ....
G. Salton, "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer", Addison-Wesley Publishers, 1989.
....data storage technology, e.g. optical disks, enable the storage of formatted and unformatted data, such as text, voice and image in the same database. The growing size of the databases necessitates the development of efficient file structures and search techniques for such multimedia environments [1,9]. Signature files provide a space efficient fast search structure by eliminating most of the irrelevant records by comparing the record signatures and the query signature without retrieving the actual records. In this paper, an instance of any kind of data will be referred to as a record. An ....
G. Salton, Automatic Text Processing: The Transformation Analysis, and Retrieval of Information by Computer, Addison-Wesley, Reading, MA. (1989)
....are used in the literature. Some of them are the number of seek operations [KEN90] the signature reduction ratio (the ratio of the number of signatures searched to the total number of signatures in the signature file) LEE89] the computation reduction ratio [LEE90, LEE95] and the response time [LIN92, ROB79, SAL89]. Some of these measures are not applicable to all indexing methods. For example, the signature reduction ratio is meaningless for the inverted file method. Consequently, there may be difficulties in the performance comparisons of different methods if a common performance measure is not used. ....
Salton, G. 1989. Automatic Text Processing: The Transformation Analysis, and Retrieval of Information by Computer. Addison Wesley, Reading, MA.
....extraction of certain words directly from the text of the document and thus, this process may be performed automatically. A number of strategies have been proposed so far to determine the words that are to be included from the document text. For an extensive survey of these strategies we refer to [21]. Once the process definition documents are mapped onto the representation language, the task of document classification is reduced to a comparison between the various document representations, i.e. vectors that contain the terms describing the document. Classically, document classification is ....
Salton, G., "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer", Reading: Addison-Wesley, 1989.
....conceptual representation of the existent huge databases in the legal domain. A promising direction to overcome this deficiency was the application of statistical analysis. Most work within this area is based on the vector space paradigm which represents similarities as distances between vectors [18]. The similarity values are the basis for subsequent clustering resulting in improved precision and recall with regard to the retrieval results [19] 20] However, fine tuning of the various parameters that influence cluster analysis remains as a non trivial task. CONCLUSION In this paper we ....
Salton G, 1989, "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer", Reading, MA: AddisonWesley
....for content representation purposes may be words or phrases derived from the document texts by an automatic indexing procedure, and the term weights are computed by taking into account the occurrence characteristics of the terms in the individual documents and the document collection as a whole. [7] Assuming that every text, or text excerpt is represented in vector form as a set of weighted terms, it is possible to compute pairwise similarity coefficients showing the similarity between pairs of texts based on coincidences in the term assignments to the respective items. Typically, the vector ....
G. Salton, Automatic Text Processing -- The Transformation Analysis and Retrieval of Information by Computer, Addison Wesley Publishing Company, Reading, MA, 1989.
No context found.
G. Salton: "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer," Addison Wesley, 1989.
No context found.
G. Salton, "Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer", Addison Wesley, 1989.
No context found.
Salton, G.: "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer". Addison-Wesley (1989)
No context found.
Salton. G. "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer". Addison-Wesley, Reading, MA. 1989.
No context found.
G. Salton: "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer," Addison Wesley, 1989.
No context found.
G. Salton, Automatic Text Processing: The Transformation Analysis, and Retrieval of Information by Computer, Addison-Wesley, Reading, MA. (1989)
No context found.
G. Salton, "Automatic Text Processing: the transformation, analysis, and retrieval of information by computer. " Addison-Wesley, 1988.
No context found.
G. Salton, "Automatic text processing, the transformation analysis and retrieval of information by computer", Addison - Wesley, 1989.
No context found.
Salton G. (1989). Automatic Text Processing - The Transformation Analysis and Retrieval of Information by Computer, Addison Wesley.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC