Results 11  20
of
120
Data Mining for Path Traversal Patterns in a Web Environment
, 1996
"... In this paper, we explore a new data mining capability which involves mining path traversal patterns in a distributed information providing environment like worldwideweb. First, we convert the original sequence of log data into a set of maximal forward references and filter out the effect of some ..."
Abstract

Cited by 130 (1 self)
 Add to MetaCart
In this paper, we explore a new data mining capability which involves mining path traversal patterns in a distributed information providing environment like worldwideweb. First, we convert the original sequence of log data into a set of maximal forward references and filter out the effect of some backward references which are mainly made for ease of traveling. Second, we derive algorithms to determine the frequent traversal patterns, i.e., large reference sequences, from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences: one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed.
Data mining using twodimensional optimized association rules: Scheme, algorithms, and visualization
, 1996
"... We discuss data mining based on association rules for two numeric attributes and one Boolean attribute. For example, in a database of bank customers, “Age ” and “Balance” are two numeric attributes, and “CardLoan ” is a Boolean attribute. Taking the pair (Age, Balance) as a point in twodimensional ..."
Abstract

Cited by 128 (9 self)
 Add to MetaCart
We discuss data mining based on association rules for two numeric attributes and one Boolean attribute. For example, in a database of bank customers, “Age ” and “Balance” are two numeric attributes, and “CardLoan ” is a Boolean attribute. Taking the pair (Age, Balance) as a point in twodimensional space, we consider an association rule of the form ((Age, Balance) c P) * (CardLoan = Yes), which implies that bank customers whose ages and balances fall in a planar region P tend to use card loan with a high probability. We consider two classes of regions, rectangles and adrmssible (i.e. connected and zmonotone) regions. For each class, we propose efficient algorithms for computing the regions that give optimal association rules for gain, support, and confidence, respectively. We have implemented the algorithms for admissible regions, and constructed a system for visualizing the rules. 1
RainForest  a Framework for Fast Decision Tree Construction of Large Datasets
 In VLDB
, 1998
"... Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework fo ..."
Abstract

Cited by 120 (8 self)
 Add to MetaCart
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework for decision tree classifiers that separates the scalability aspects of algorithms for constructing a decision tree from the central features that determine the quality of the tree. This generic algorithm is easy to instantiate with specific algorithms from the literature (including C4.5, CART,
A Survey of Temporal Knowledge Discovery Paradigms and Methods
 IEEE Transactions on Knowledge and Data Engineering
, 2002
"... AbstractÐWith the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and impl ..."
Abstract

Cited by 119 (8 self)
 Add to MetaCart
(Show Context)
AbstractÐWith the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and implemented systems are using an enhanced temporal understanding to explain aspects of behavior associated with the implicit timevarying nature of the universe. This paper investigates the confluence of these two areas, surveys the work to date, and explores the issues involved and the outstanding problems in temporal data mining. Index TermsÐTemporal data mining, time sequence mining, trend analysis, temporal rules, semantics of mined rules. 1
BOAT  Optimistic Decision Tree Construction
, 1999
"... Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision ..."
Abstract

Cited by 117 (2 self)
 Add to MetaCart
Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision trees. All current algorithms to construct decision trees, including all mainmemory algorithms, make one scan over the training database per level of the tree. We introduce a new algorithm (BOAT) for decision tree construction that improves upon earlier algorithms in both performance and functionality. BOAT constructs several levels of the tree in only two scans over the training database, resulting in an average performance gain of 300% over previous work. The key to this performance improvement is a novel optimistic approach to tree construction in which we construct an initial tree using a small subset of the data and refine it to arrive at the final tree. We guarantee that any differen...
Efficient algorithms for geometric optimization
 ACM Comput. Surv
, 1998
"... We review the recent progress in the design of efficient algorithms for various problems in geometric optimization. We present several techniques used to attack these problems, such as parametric searching, geometric alternatives to parametric searching, pruneandsearch techniques for linear progra ..."
Abstract

Cited by 114 (10 self)
 Add to MetaCart
We review the recent progress in the design of efficient algorithms for various problems in geometric optimization. We present several techniques used to attack these problems, such as parametric searching, geometric alternatives to parametric searching, pruneandsearch techniques for linear programming and related problems, and LPtype problems and their efficient solution. We then describe a variety of applications of these and other techniques to numerous problems in geometric optimization, including facility location, proximity problems, statistical estimators and metrology, placement and intersection of polygons and polyhedra, and ray shooting and other querytype problems.
A General Incremental Technique for Maintaining Discovered Association Rules
 In Proceedings of the Fifth International Conference On Database Systems For Advanced Applications
, 1997
"... A more general incremental updating technique is developed for maintaining the association rules discovered in a database in the cases including insertion, deletion, and modification of transactions in the database. A previously proposed algorithm FUP can only handle the maintenance problem in the c ..."
Abstract

Cited by 110 (5 self)
 Add to MetaCart
A more general incremental updating technique is developed for maintaining the association rules discovered in a database in the cases including insertion, deletion, and modification of transactions in the database. A previously proposed algorithm FUP can only handle the maintenance problem in the case of insertion. The proposed algorithm FUP2 makes use of the previous mining result to cut down the cost of finding the new rules in an updated database. In the insertion only case, FUP2 is equivalent to FUP. In the deletion only case, FUP2 is a complementary algorithm of FUP which is very efficient when the deleted transactions is a small part of the database, which is the most applicable case. In the general case, FUP2 can efficiently update the discovered rules when new transactions are added to a transaction database, and obsolete transactions are removed from it. The proposed algorithm has been implemented and its performance is studied and compared with the best algorithms for mining...
Efficient Mining of Association Rules in Distributed Databases
, 1996
"... Many sequential algorithms have been proposed for mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of ..."
Abstract

Cited by 98 (3 self)
 Add to MetaCart
Many sequential algorithms have been proposed for mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In this study, an efficient algorithm, DMA, is proposed. It generates a small number of candidate sets and requires only O(n) messages for support count exchange for each candidate set, where n is the number of sites in a distributed database. The algorithm has been implemented on an experimental test bed and its performance is studied. The results show that DMA has superior performance when comparing with the direct application of a popular sequential algorithm in distributed databases.
Using a HashBased Method with Transaction Trimming and Database Scan Reduction for Mining Association Rules
 IEEE Transactions on Knowledge and Data Engineering
, 1997
"... In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply ..."
Abstract

Cited by 83 (9 self)
 Add to MetaCart
In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large kitemset in increasing order of k where a large kitemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usual...
Finding Intensional Knowledge of DistanceBased Outliers
 In VLDB
, 1999
"... Existing studies on outliers focus only on the identification aspect; none provides any intensional knowledge of the outliersby which we mean a description or an explanation of why an identified outlier is exceptional. For many applications, a description or explanation is at least as vital to t ..."
Abstract

Cited by 80 (1 self)
 Add to MetaCart
Existing studies on outliers focus only on the identification aspect; none provides any intensional knowledge of the outliersby which we mean a description or an explanation of why an identified outlier is exceptional. For many applications, a description or explanation is at least as vital to the user as the identification aspect. Specifically, intensional knowledge helps the user to: (i) evaluate the validity of the identified outliers, and (ii) improve one's understanding of the data. The two main issues addressed in this paper are: what kinds of intensional knowledge to provide, and how to optimize the computation of such knowledge. With respect to the first issue, we propose finding strongest and weak outliers and their corresponding structural intensional knowledge. With respect to the second issue, we first present a naive and a seminaive algorithm. Then, by means of what we call path and semilattice sharing of I/O processing, we develop two optimized approaches. We provi...