| Jagadish H., Faloutsos C.: "Hybrid Index Organizations for Text Databases", Proceedings of the Extending Database Technology Conference (EDBT), pp. 310-327, March 1992. 209 |
....multiple copies of each real text block address, i.e. once for each key word present in the block. The index needs to frequently undergo re organization under intensive information insertion updating procedures. Also, the method is reported to perform poorly for multiple term user queries [6]. The SC SF intermediary index is a sequential structure with records consisting of a real text block address and a fixed size binary signature. By construction, the scheme does not register search key values and each real text block address is stored only once. Compared to the Inverted Index, ....
....which is introduced in the sequel. The new method is labelled S Index,whereS stands for Signature and Index implies the Inverted Index. 2 Description of S Index Hybrid structures which combine SC SF with the Inverted Index are reported to improve the performance of the Signature File method [1, 6]. However, the information loss problem remains. Consequently, significant processing and I O overhead is introduced during the full text scanning stage. In our case, the aim is to combine the query processing efficiency of ERSF with the information compression rate achieved by PE. In accordance ....
Jagadish H., Faloutsos C.: "Hybrid Index Organizations for Text Databases", Proceedings of the Extending Database Technology Conference (EDBT), pp. 310-327, March 1992. 209
....(1,1) Figure 2. Variations of a motive in Modes 1 and 2 with D 4 serving as a melodic basis. modified here to handle a Musical Database. The Audio Data File is a collection of original melodic data blocks. These blocks may contain both semantic data and comments appended in the form of text [4]. In the Signature File are stored the signature records of the audio blocks and to each record is attached a pointer to the corresponding audio block. The Signature File and the Audio Data File may be kept and processed separately. The Signature File, which is of much smaller size, may be copied ....
Jagadish H., Faloutsos C., Hybrid Index Organizations for Text Databases, Proceedings of the Extending Database Technology Conference (EDBT), pp. 310-327, March 1992.
....they are rearranged in the heap or copied to other blocks with more space. Techniques for handling inverted lists larger than a disk block are not discussed, nor is the disk block technique fully evaluated. A more sophisticated inverted list implementation was proposed by Faloutsos and Jagadish [31]. In their scheme, small lists are stored as inverted lists, while large lists are stored as signature files. They have a similar goal of reducing the processing costs for long inverted lists, but their solution is inappropriate for the inference network model. In [32] Faloutsos and Jagadish ....
Faloutsos, C. and Jagadish, H. V. Hybrid index organizations for text databases. In Proc. of the 3rd Inter. Conf. on Extending Database Technology (EDBT), pages 310--327, 1992.
....We try to use a multi attribute index structure [15] for standard attributes and document description vectors at the same time. Alternatively wetry to use conventional B trees for the standard attributes and inverted lists for the document description vectors exploiting the results presented in [9] and [2] To perform effective query optimization in suchanenvironment, a sophisticated 8 cost model is needed. With respect to term based retrieval a crucial aspect is the vocabulary. At the momentwe use one vocabulary for the whole object base, which brings up a vocabulary with a large number ....
C. Faloutsos and H.V. Jagadish. Hybrid Index Organizations for Text Databases. In Proc. 3rd Intl. Conf. on Extending Database Technology,volume 580 of LNiCS, pages 310--327, Vienna, Austria, 1992.
....In this paper, we consider group update operations, where a number of keys must or may be inserted or deleted at the same time. These operations, in particular group insertion, have renewed interest because of applications in WWW search engines or document databases using inverted index techniques [4, 5], or in other applications where a large number of keys must or can be brought into the main index at the same time [9] Structures with relaxed balance are well suited for concurrent applications of this nature because newly inserted elements are available immediately after the actual update. ....
Christos Faloutsos and H. V. Jagadish. Hybrid Index Organizations for Text Databases. In Third International Conference on Extending Database Technology, volume 580 of Lecture Notes in Computer Science, pages 310-327, 1992.
....work can be considered an extension of these more traditional inverted list implementations, which simply do not provide the functionality required by the query processing optimizations I am considering. A more sophisticated inverted list implementation was proposed by Faloutsos and Jagadish [19]. In their scheme, small lists are stored as inverted lists, while large lists are stored as signature files. They have a similar goal of reducing the processing costs for long inverted lists, but their solution is inappropriate for the inference network model. In [20] Faloutsos and Jagadish ....
C. Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In Proc. of the 3rd Inter. Conf. on Extending Database Technology, pages 310--327, 1992.
....with an adaptive allocation scheme (not studied here) and an unique style that combines benefits of the whole and new styles. Performance comparisons between our work and the schemes presented there are difficult since updates are not batched in that paper. In another work, Faloutsos and Jagadish [3] extensively analyze a dual structure scheme based on signature schemes for long lists and inverted lists for short lists. The division in the structure is static as opposed to a dynamic scheme presented here. In addition, the we believe that using inverted lists for short lists is computationally ....
Christos Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In A. Pirotte, C. Delobel, and G. Gottlob, editors, Proceedings 3rd International Conference on Extending Database Technology -- EDBT '92, Vienna, 1992. Springer--Verlag.
....Hybrid methods use inversion and signature files in the same retrieval system. The aim is to combine the advantages of both systems while eliminating the disadvantages. The method proposed by Faloutsos and Jagadish uses the posting list storage for rare terms and bit map storage for frequent terms [Faloutsos and Jagadish 1991]. For different environments different organizations for the bit map are proposed. The proposed method maintains the lookup table for all terms. Therefore, the space overhead generated by the lookup table is not eliminated. Also, the time required to search the lookup table is the same as in ....
....with respect to t 1 (t 1 is called clustering term) Although the records containing t 2 are distributed randomly, the number of on blocks at the result of q 2 will be less than or equal to the number of on blocks at the result of q 1 . Assuming that the 80 20 rule holds [Knuth 1975, Faloutsos and Jagadish 1991], if only frequently used terms (about 20 of all terms) are used as clustering terms, for the majority of one word queries (about 80 of all queries) there will be a few onblocks at the result. Due to the discussion presented above, the probability of obtaining a random on bit distribution at the ....
FALOUTSOS, C., and JAGADISH, H. V., 1991. Hybrid index organizations for text databases, CS- TR-2621, Computer Science, Univ. of Maryland, 1991.
....than the methods presented here. Faloutsos and Jagadish [3] extensively analyze the physical organization of long list. Performance comparisons between our work and the schemes presented there are difficult since updates are not batched in that paper. In another work, Faloutsos and Jagadish [2] extensively analyze a dual structure scheme based on signature schemes for long lists and inverted lists for short lists. The division in the structure is static as opposed to a dynamic scheme presented here. In addition, we believe that using inverted lists for short lists is computationally ....
Christos Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In A. Pirotte, C. Delobel, and G. Gottlob, editors, Proceedings 3rd International Conference on Extending Database Technology -- EDBT '92, Vienna, 1992. Springer--Verlag.
....the corresponding bit map representations may be stored. This will save 13 space and processing time since bit maps can be merged efficiently by bitwise operations. The method proposed by Faloutsos and Jagadish uses the posting list storage for rare terms and bit map storage for frequent terms [FAL91]. Faloutsos and Jagadish proposed different organizations for using the bit map in different environments. The proposed method maintains the lookup table for all terms. Therefore, the time required to search the lookup table is the same as other inversion methods. Also, the space overhead ....
....most convenient way. As we mentioned before, the posting lists are more compact representations of sparse columns of the BRTM. However, the posting list representation may not be feasible for all types of terms. For frequent terms, storing a bit map may be more feasible than storing a posting list [FAL91] . Similarly, if it is more compact, a bit slice of a BSSF may be stored similar to posting lists of an IF. 26 The methods proposed by Faloutsos and Chan demonstrate the distinction between a BSSF and an IF [FAL88b] The Compressed Bit Slices (CBS) method proposed in [FAL88] is a BSSF method. In ....
Faloutsos, C., Jagadish, H. V. 1991. Hybrid index organizations for text databases. CS-TR-2621, Computer Science, Univ. of Maryland.
....claim being that they occupy between 50 and 300 percent of the space of the text they index [Haskin 1981] With current techniques inverted files are stored in around 10 of the space of the text they index [Witten et al. 1994] 3. 3 Bitstring Signature Files In signature file indexes [Faloutsos 1992], each record is allocated a fixed width signature, or bitstring, of w bits. Each word that appears in the record is hashed a number of times to determine the bits in the signature that should be set, with no remedial action taken if two or more distinct words should happen (as is inevitable) to ....
....each distinct term appears on average twice per record, then about 25 bits per word occurrence are required by the index, corresponding to approximately 50 of the space occupied by the input text. Note that standard bitstring signature files are claimed to be substantially more compact than this [Faloutsos 1992], since there is no disk access penalty for having a higher bit density. In fact, as will be shown below, the difference is negligible; for a given false match rate bitstring signature files are only slightly smaller than bitsliced signature files. Moreover, query processing costs are much ....
[Article contains additional citation context not shown here]
Faloutsos, C. and Jagadish, H. 1992. Hybrid index organizations for text databases. In A. Pirotte, C. Delobel, and G.Gottlob, Eds., Proc. 3rd International Conference on Extending Database Technologies. Springer-Verlag, Berlin, pp. 310--327. LNCS 580.
....that all buckets can fit in main memory during indexing, potentially requiring significant main memory resources. Our scheme makes no such assumption, requiring substantially less main memory. Another scheme that handles large lists distinctly from small lists is proposed by Faloutsos and Jagadish [FJ92a]. In their scheme, small lists are stored as inverted lists, while large lists are stored as signature files. Again, we are primarily concerned with inverted lists and do not consider signature file solutions. In [FJ92b] Faloutsos and Jagadish examine update and storage costs for a family of long ....
Christos Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In Proc. of the 3rd Inter. Conf. on Extending Database Technology, pages 310--327, 1992.
....which, in the absence of compression, can be as large as the textbase itself [20] Also, under intensive information insertion updating procedures, the index is subject to node splitting which degrades performance. File inversion is also reported to perform poorly for multiple term user queries [15]. Due to the large size of the index file, the scheme is inappropriate for taking advantage of a network environment by having remote clients utilize locally stored copies of the index and establish (logical) access paths to selected sections of a server residing textbase. In the case of SC SF, ....
....of the signature file with those of file inversion. The former excels in storage utilization, whereas the latter does best in query processing efficiency. Hybrid structures have been proposed which combine SC SF with the inverted index, and improve the performance of the signature file method [3, 15]. However, they involve information loss: significant processing and I O overhead is introduced by the full text scanning stage. In accordance with what has been stated in Section 4, amongst the signature file variations: a) ERSF excels in query processing efficiency, and (b) PE is the most ....
Jagadish H. and Faloutsos C.: "Hybrid Index Organizations for Text Databases", Proceedings Extended Database Technology Conference (EDBT'92), pp.310-327, 1992.
....with an adaptive allocation scheme (not studied here) and an unique style that combines benefits of the whole and new styles. Performance comparisons between our work and the schemes presented there are difficult since updates are not batched in that paper. In another work, Faloutsos and Jagadish [3] extensively analyze a dual structure scheme based on signature schemes for long lists and inverted lists for short lists. The division in the structure is static as opposed to a dynamic scheme presented here. In addition, the we believe that using inverted lists for short lists is computationally ....
Christos Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In A. Pirotte, C. Delobel, and G. Gottlob, editors, Proceedings 3rd International Conference on Extending Database Technology -- EDBT '92, Vienna, 1992. Springer--Verlag.
....insertions 32 data . data . new document document file . zoo . Aaron ffl STAIRS [IBM] ffl MEDLARS ffl DIALOG, ORBIT, LEXIS ffl refer lookbib [Les78] 33 Recent developments challenges: ffl skeweness of distribution (Zipf s law) Zip49] hybrid methods [FJ92a] adaptive postings lists [FJ92b] ffl huge indices; fast insertions Tomasic et al. TGMS94] Cutting and Pedersen [CP90] exploit skewness (short lists in B tree, long lists on a separate file) Zobel et al. [ZMSD92] use Elias s [Eli75] compression scheme for postings lists; glimpse [MW94] uses ....
C. Faloutsos and H.V. Jagadish. Hybrid index organizations for text databases. EDBT '92, pages 310--327, March 1992. Also available as UMIACS-TR-91-33 and CS-TR-2621.
.... the effects of skewness in the frequencies of query terms (e.g. Zipf distribution [Zip49] b) the parallelization of vertical [LF92] and horizontal [SD83, LL89] partitioning signature methods, and (c) the parallelization of hybrid methods that combine signature retrieval with inverted indices [FJ91]. ....
Christos Faloutsos and H. V. Jagadish. Hybrid Index Organizations for Text Databases. Technical Report UMIACS-TR-91-33 and CS-TR-2621, Department of Computer Science, University of Maryland, March 1991.
No context found.
Jagadish H., Faloutsos C.: "Hybrid Index Organizations for Text Databases", Proceedings of the Extending Database Technology Conference (EDBT), pp. 310-327, March 1992. 209
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC