46 citations found. Retrieving documents...
M. Holsheimer, M. Kersten, H. Mannila, and Toivonen. A perspective on databases and data mining. In 1st Intl. Conf. Knowledge Discovery and Data Mining, Aug. 1995.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Mining First-order Knowledge Bases for Association Rules - Jamil   (Correct)

....intersection node. The least xpoint of the intersection process helped because we know that we have computed all the meet irreducible elements by now and no other intersection nodes exists. There are several works that have investigated the issue of declarative association rule mining using SQL [7, 23, 19, 16, 11]. Most of these works, specially [23, 19] attempt to simulate apriori in SQL giving rise to a complicated and awkward method. They do not exploit the inherent declarative properties of transaction databases as we have identi ed in this paper. The inherent procedurality of their proposed ....

Marcel Holsheimer, Martin L. Kersten, Heikki Mannila, and Hannu Toivonen. A perspective on databases and data mining. In Proc. of the sixth ACM SIGKDD Intl. Conf., pages 150-155, Montreal, Quebec, 1995.


MINTO: A Software Tool for Mining Manufacturing Databases - Haritsa   (Correct)

....These features are described in detail in Section 7.2. Previous Work Although the vertical layout has several positive features, as described above, it has received comparatively little attention in the data mining literature. In fact, to the best of our knowledge, it has been considered only in [13, 16] and a series of papers by Zaki et al. [18, 20, 19] The vertical layout was first proposed in [13] which described how, with this layout, itemset supports can be counted using only the simple set operations of union and intersection. Subsequently, a graph based algorithm called DLG (Direct Large ....

....several positive features, as described above, it has received comparatively little attention in the data mining literature. In fact, to the best of our knowledge, it has been considered only in [13, 16] and a series of papers by Zaki et al. [18, 20, 19] The vertical layout was first proposed in [13] which described how, with this layout, itemset supports can be counted using only the simple set operations of union and intersection. Subsequently, a graph based algorithm called DLG (Direct Large itemset Generation) was proposed in [16] These ideas were further developed in the Eclat and ....

[Article contains additional citation context not shown here]

M. Holsheimer, M. Kersten, H. Mannila and H. Toivonen. A perspective on databases and data mining. In Intl. Conf. on Knowledge Discovery and Data Mining, August 1995.


Turbo-charging Vertical Mining of Large Databases - Shenoy, Haritsa, Sudarshan, .. (2000)   (30 citations)  (Correct)

....fast and simple support counting, for reducing the effective database size, for compact storage of the database, for better support of dynamic databases, and for asynchrony in the counting process. Based on these observations, a variety of vertical mining algorithms have been proposed recently [3, 4, 6, 8, 10]. Performance evaluations of these algorithms has indicated that they can provide significantly faster mining times as compared to their horizontal counterparts. While the above mentioned algorithms have served to highlight the utility of the vertical approach, they all suffer from a common ....

....particular TID is added only once to a snake. 7 Related Work In the previous sections, we described the functioning of our new VIPER algorithm. We now move on to reviewing the prior work in vertical mining algorithms. Algorithms for (sequential) vertical mining have been previously presented in [6, 8, 10, 4, 3]. We restrict our attention here to the most recent among these, namely MaxClique[10] ColumnWise[3] and Hierarchical BitMap[4] w w w w w (a) b) w CD BC BCD C A B A D AB ABC AC CD BC BCD C ABCD B AC ABC D ABCD Figure 7: Bottom up and Top Down Approaches to Snake Writing The ....

M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proc. of 1st Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1995.


Turbo-charging Vertical Mining of Large Databases - Shenoy, Haritsa, Sudarshan, .. (2000)   (30 citations)  (Correct)

....fast and simple support counting, for reducing the e ective database size, for compact storage of the database, for better support of dynamic databases, and for asynchrony in the counting process. Based on these observations, a variety of vertical mining algorithms have been proposed recently [3, 4, 6, 9, 11]. Performance evaluations of these algorithms has indicated that they can provide significantly faster mining times as compared to their horizontal counterparts. While the above mentioned algorithms have served to highlight the utility of the vertical approach, they all suffer from a common ....

....ABCD because writes to AB may be made from all of the di erent top level candidates for which it forms part of the cover. It is, of course, ensured that a particular TID is added only once to a snake. 7 Related Work Algorithms for (sequential) vertical mining have been previously presented in [6, 9, 11, 4, 3]. We restrict our attention here to the most recent among these, namely MaxClique[11] ColumnWise[3] and Hierarchical BitMap[4] The MaxClique algorithm, which is a pioneering effort in the development of the vertical mining approach, is based on a vertical tid list (VTL) format. It rst ....

M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proc. of 1st Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1995.


Turbo-charging Vertical Mining of Large Databases - Shenoy, Bhalotia, Haritsa.. (2000)   (30 citations)  (Correct)

....fast and simple support counting, for reducing the effective database size, for compact storage of the database, for better support of dynamic databases, and for asynchrony in the counting process. Based on these observations, a variety of vertical mining algorithms have been proposed recently [3, 4, 6, 9, 11]. Performance evaluations of these algorithms has indicated that they can provide significantly faster mining times as compared to their horizontal counterparts. While the above mentioned algorithms have served to highlight the utility of the vertical approach, they all suffer from a common ....

....ABCD because writes to AB may be made from all of the different top level candidates for which it forms part of the cover. It is, of course, ensured that a particular TID is added only once to a snake. 7 Related Work Algorithms for (sequential) vertical mining have been previously presented in [6, 9, 11, 4, 3]. We restrict our attention here to the most recent among these, namely MaxClique[11] ColumnWise[3] and Hierarchical BitMap[4] The MaxClique algorithm, which is a pioneering effort in the development of the vertical mining approach, is based on a vertical tid list (VTL) format. It first ....

M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proc. of 1st Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1995.


Mining Association Rules From Market Basket Data.. - Hilderman.. (1998)   (Correct)

....of share for itemsets, and redefine the notions of frequent itemsets and confidence. We refer to this extended formalism as the shareconfidence framework for association rules and refer to the new itemset measures as simply share measures. In this framework, any of the algorithms presented in [2, 3, 16, 19, 22, 23, 29, 30, 31, 32, 33] can used to generate frequent itemsets using our new definition for frequent itemset. The definitions in this section have been implemented in a data mining system for analyzing market basket data. This system is an extension of DB Discover, a software tool for knowledge discovery form databases ....

M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 150--155, August 1995.


Parallel Data Mining for Association Rules on.. - Parthasarathy, Zaki.. (2000)   (5 citations)  (Correct)

....of Partition and produce a much smaller set of potentially frequent candidates. It requires at most two database scans. Also, sampling may be used to eliminate the second pass altogether. Approaches using only general purpose DBMS systems and relational algebra operations have also been studied [22, 23]. All the above algorithms generate all possible frequent itemsets. Methods for nding the maximal elements include All MFS [19] which is a randomized algorithm to discover maximal frequent itemsets. The Pincer Search algorithm [27] not only constructs the candidates in a bottom up manner like ....

M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In 1st Intl. Conf. Knowledge Discovery and Data Mining, August 1995.


Multi-Relational Data Mining - Knobbe, Blockeel, Siebes, van der.. (1999)   (11 citations)  (Correct)

....to test these candidates. This way, a clear separation between handling massive datasets and performing intelligent search is achieved. Each operation can now be optimised separately. A lot of work has been done on implementing efficient client server architectures for mining attributevalue data [5, 7, 8, 9, 10, 11, 13]. Most of this work is centred around expressing primitives in SQL in order to use a conventional RDBMS, or extending the SQL language to support potential specific needs the data mining algorithm may have. An alternative approach is to use a dedicated data mining server which is optimised for the ....

Holsheimer, M., Kersten, M., Mannila, H., Toivonen, H. A Perspective on Databases and Data Mining, Proceedings KDD '95, 1995


Quality of Service and Electronic Newspaper: The Etel.. - Issarny, Banatre.. (2000)   (Correct)

....and or page groups relate to the data analysis [18, 9, 26, 14, 12] and data mining [1, 27, 21, 8] domains. An evaluation of the various eligible algorithms with respect to our criteria, is proposed in [6] Based on the result of this evaluation, our algorithm is based on associative data mining [2, 13, 4]. Briefly stated, associative data mining computes a set of inference rules among database elements (or items) according to the transactions stored in the database where each transaction contains a set of database elements (or itemset) In the Etel context, an item corresponds to a newspaper ....

M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proc. of the 1st International Conference on Knowledge Discovery and Data Mining, 1995.


Parallel Data Mining for Association Rules on Shared-memory.. - Zaki, al. (1998)   (30 citations)  (Correct)

....of Partition and produce a much smaller set of potentially frequent candidates. It requires at most two database scans. Also, sampling may be used to eliminate the second pass altogether. Approaches using only general purpose DBMS systems and relational algebra operations have also been studied (Holsheimer et al. 1995; Houtsma Swami 1995) All the above algorithms generate all possible frequent itemsets. Methods for finding the maximal elements include All MFS (Gunopulos, Mannila, Saluja 1997) which is a randomized algorithm to discover maximal frequent itemsets. The Pincer Search algorithm (Lin Kedem ....

Holsheimer, M.; Kersten, M.; Mannila, H.; and Toivonen, H. 1995. A perspective on databases and data mining. In 1st Intl. Conf. Knowledge Discovery and Data Mining.


Extended Concepts for Association Rule Discovery - Rantzau (1997)   (Correct)

....have been read, where M jDj, and stops counting them after the k itemsets have seen all transactions, i.e. after jDj (k Gamma 1)M transactions. Apriori is a special case of DIC, with M set to jDj. Thus, DIC outperforms Apriori if M is chosen appropriately. An algorithm presented in [HKMT95] uses only database operations like union and intersection of a general purpose database management system. The transaction database that is assumed there is stored as a decomposed storage structure, i.e. items are stored as columns where in each column all identifiers of transactions are stored ....

Marcel Holsheimer, Martin Kersten, Heikki Mannila, and Hannu Toivonen. A Perspective on Databases and Data Mining. In Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, Montreal, Quebec, Canada, pages 150--155, August 1995.


Mining Association Rules: Deriving a Superior Algorithm .. - Hipp, Güntzer.. (2000)   (7 citations)  (Correct)

....that DFS does not allow proper candidate pruning by subset checking makes BFS somewhat superior to DFS. But we must keep in mind that checking subsets might be costly, especially for larger itemsets. In addition, it makes only sense for itemsets of a size greater than 2. But as Figure 2(d) and [7] show, the time spent with the itemsets of size 2 may dominate the whole generation process. Counting occurrences is usually done by using a hashtree, c.f. 2] Counting a candidate that occurs rather infrequently is quite cheap. Costs are only caused by the actual occurrences of the candidate in ....

M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proc. of the 1st Int'l Conf. on Knowlegde Discovery and Data Mining (KDD '95), Montreal, Canada, August 1995.


A Case Study in Knowledge Acquisition for Insurance Risk.. - Williams, Huang (1996)   (3 citations)  (Correct)

.... function governs the selection of the patterns which are of interest to the user and has been identified by others as an integral part of KDD (Frawley et al. 1992, Matheus et al. 1993, Fayyad, Piatetsky Shapiro and Smyth 1996) Pattern evaluation is also important in reducing the search space (Holsheimer, Kersten, Mannila and Toivonen 1995). Formally, a Pattern Evaluation function F is a function that maps from a set of statements expressed in some language L (e.g. production rules) to a set of (usually) numeric values. The evaluation might consider each pattern in the context of all discovered patterns, or patterns might be ....

Holsheimer, M., Kersten, M., Mannila, H. and Toivonen, H.: 1995, A perspective on databases and data mining, Proc. of the First International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp. 150--155.


Modelling the KDD Process - Four Stage Process   (Correct)

.... function governs the selection of the patterns which are of interest to the user and has been identified by others as an integral part of KDD (Frawley et al. 1992, Matheus et al. 1993, Fayyad, PiatetskyShapiro and Smyth 1996) Pattern evaluation is also important in reducing the search space (Holsheimer, Kersten, Mannila and Toivonen 1995). Formally, a Pattern Evaluation function F is a function that maps from a set of statements expressed in L (e.g. production rules) to a set of (usually) numeric values. The evaluation might consider each pattern in the context of all discovered patterns, or patterns might be evaluated for their ....

Holsheimer, M., Kersten, M., Mannila, H. and Toivonen, H.: 1995, A perspective on databases and data mining, Proc. of the First International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp. 150--155.


A New Algorithm for Faster Mining of Generalized.. - Hipp, Myka, Wirth.. (1998)   (6 citations)  (Correct)

.... studies in literature seem to be contradictory concerning the di erent approaches of support counting: According to [5] the algorithm Partition that relies on tidintersections achieves a much better performance than Apriori that counts actual occurrences (both use BFS) On the other hand in [6] it is shown that tid intersections usually are more expensive than counting actual occurences. Our own experiments support [6] Even when extending Eclat to prune candidates that have an infrequent subset cf Subsection 4.2 Eclat does not perform better than Apriori for k itemsets with k 1 ....

....the algorithm Partition that relies on tidintersections achieves a much better performance than Apriori that counts actual occurrences (both use BFS) On the other hand in [6] it is shown that tid intersections usually are more expensive than counting actual occurences. Our own experiments support [6]: Even when extending Eclat to prune candidates that have an infrequent subset cf Subsection 4.2 Eclat does not perform better than Apriori for k itemsets with k 1 on datasets comparable to those in [5] 4 Algorithm Prutax Based on the insights from the boolean case described in Section 3, ....

M. Holsheimer, M. Kersten, Heikki Mannila, Hannu Toivonen, A Perspective on Databases and Data Mining, In Proc. of the KDD '95, 1995, Montreal, Canada.


Mining Navigation History for Recommendation - Fu (2000)   (17 citations)  (Correct)

....to use this method. SurfLen is an information recommendation system which suggest interesting web pages to users. The underlying data mining technique of SurfLen is Association Rules . 2. ASSOCIATION RULES One of the more well studied problems in data mining is the mining of association rules [1, 2, 4, 5] in market basket data, which was first introduced by Argawal, Imielinski and Swami [1, 2] Market basket data stores items purchased on a per transaction basis for a sales organization. Basket data type transactions do not necessarily consist of items bought at the same time. The data may be ....

....we can not as yet prove our assumption true in a formal way, we are confident that applying association rules to navigation history would provide good recommendation. 6. RELATED WORK Since the introduction of association rules, many algorithms have been proposed to improve their performance [1, 2, 4, 5]. While our work builds heavily on previous work, our research has its root in navigation history tracking and information recommender systems. Although not many systems have been built for navigation history collection, many research systems for recommending information have been built using ....

Holsheimer, M., Kersten M., Mannila, H., and Toivonent H. A perspective on database and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (Montreal, Canada, 1996).


Updating Large Itemsets with Early Pruning - Ayan (1999)   (Correct)

....we mentioned previously. It counts the supports of the candidates over the created tidlists instead of the database. The major advantages of Partition are the reduction in I O cost, and usage of main memory while computing large itemsets. CHAPTER 2. A SURVEY IN ASSOCIATION RULES 21 MONET System [HKMT95] discovers association rules by using only a general purpose database management system and the operations of relational algebra, union and intersection operations. The database is stored as a set of items (columns) where T IDs of the transactions that contain the item are enumerated in this ....

Marcel Holsheimer, Martin Kersten, Heikki Mannila, and Hannu Toivonen. A perspective on databases and data mining. In Proceedings of 1 st Intl. Conf. on Knowledge Discovery and Data Mining (KDD'95), pages 150--155, Montreal, Canada, August 1995.


A New Algorithm for Faster Mining of Generalized.. - Hipp, Myka, Wirth.. (1998)   (6 citations)  (Correct)

.... studies in literature seem to be contradictory concerning the different approaches of support counting: According to [5] the algorithm Partition that relies on tidintersections achieves a much better performance than Apriori that counts actual occurrences (both use BFS) On the other hand in [6] it is shown that tid intersections usually are more expensive than counting actual occurences. Our own experiments support [6] Even when extending Eclat to prune candidates that have an infrequent subset cf Subsection 4.2 Eclat does not perform better than Apriori for k itemsets with k 1 ....

....the algorithm Partition that relies on tidintersections achieves a much better performance than Apriori that counts actual occurrences (both use BFS) On the other hand in [6] it is shown that tid intersections usually are more expensive than counting actual occurences. Our own experiments support [6]: Even when extending Eclat to prune candidates that have an infrequent subset cf Subsection 4.2 Eclat does not perform better than Apriori for k itemsets with k 1 on datasets comparable to those in [5] 4 Algorithm Prutax Based on the insights from the boolean case described in Section ....

M. Holsheimer, M. Kersten, Heikki Mannila, Hannu Toivonen, A Perspective on Databases and Data Mining, In Proc. of the KDD '95, 1995, Montreal, Canada.


Scalable Algorithms for Association Mining - Zaki (2000)   (16 citations)  (Correct)

....The AS CPA algorithm and its sampling versions [20] build on top of Partition and produce a much smaller set of potentially frequent candidates. It requires at most two database scans. Approaches using only general purpose DBMS systems and relational algebra operations have also been studied [14, 15]. Detailed architectural alternatives in the tight integration of association mining with DBMS were presented in [25] They also pointed out the bene ts of using the vertical database layout. All the above algorithms generate all possible frequent itemsets. Methods for nding the maximal elements ....

M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In 1st Intl. Conf. Knowledge Discovery and Data Mining, August 1995.


A Parallel Data Mining Architecture for Massive Data Sets - George, Knobbe (1999)   (Correct)

....large number of algorithms can be expressed as variations on such a top down search for patterns. The application of a level wise algorithm to the discovery of a variety of patterns (called sentences by the authors) is discussed in [16] Examples include the discovery of association rules ( 2] [10], 16] 20] strong rules , and inclusion dependencies. 3.4.3. Creation and use of pCORES 3.4.4. Data Mining Performance 4. Client Another application of the level wise algorithm is described in [14] It discusses the discovery of keys and foreign key relations. The level wise algorithm is ....

....trees that require full table scans at each node, and hence do not benefit from the zooming effect, will require O(m n p(d) time to compute all relevant information at level d, which is clearly slower. Another popular paradigm in data mining is the discovery of association rules ( 1] 2] 5] [10], 13] 15] 21] Given a set of transactions, where each transaction consists of a set of items, an association rule is an expression X = Y, where X and Y are sets of items. Such a rule indicates that transactions that contain X are likely to contain Y also. Two measures, confidence and ....

Holsheimer, M., Kersten, M., Mannila, H., Toivonen, H. A Perspective on Databases and Data Mining, Proceedings KDD '95.


Discovery of Relational Association Rules - Dehaspe, Toivonen (2000)   (7 citations)  Self-citation (Toivonen)   (Correct)

....is natural for a large number of data mining problems. Patterns that are rare, e.g. that concern only a couple of customers, are probably not reliable nor useful for the user. Problem settings that are close to the problem of discovering association rules include the use of item type hierarchies [14, 15, 31], the discovery of episodes in event sequences [21, 23] and the search of sequential patterns from series of transactions [4, 32] For all these cases the pattern language is more complex than in the market basket application, and specialized algorithms exist for the tasks. We present a powerful ....

M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 150 -- 155. AAAI Press, Menlo Park, CA, August 1995.


Scalable Storage for a DBMS using Transparent Distribution - Karlsson, Kersten (1997)   Self-citation (Kersten)   (Correct)

....optimizations Live Optimization . ffl High throughput achieved using bulk operator processing. This enables Monet to benefit most from modern computers cache line behavior, on the expense of that intermediate results are fully materialized. Monet is heavily used in Data Mining applications[HKM95] and GIS[BQK96] and its supreme performance has been demonstrated against several benchmarks, including OO7[BKK96] and TPCD [BWK98] The current implementation runs on workstations and exploits parallelism of SMP machines. However, shared memory computers provide limited means to scale. ....

M. Holsheimer, M. L. Kersten, and M. L. Mannilla. A Perspective on Databases and Data Mining. Montreal, Canada, 1995.


Mining the Smallest Association Rule Set for Predictions - Jiuyong Li Hong (2001)   (1 citation)  (Correct)

No context found.

M. Holsheimer, M. Kersten, H. Mannila, and Toivonen. A perspective on databases and data mining. In 1st Intl. Conf. Knowledge Discovery and Data Mining, Aug. 1995.


Cubegrades - Generalization Of Association Rules To Mine Large.. - Abdulghani   (Correct)

No context found.

M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 150 -- 155, Montreal, Canada, August 1995.


Mining Is-Part-Of Association Patterns From Semistructured Data - Wang, Liu   (Correct)

No context found.

M. Holsheimer, M. Kersten, H. Mannila, H. Toivonen, "A perspective on databases and data mining", KDD 1995, 150-155

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC