10 citations found. Retrieving documents...
S. D. Bay and M. J. Pazzani. Detecting change in categorical data: mining contrast sets. In SIGKDD, 1999.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Mining Changes of Classification by Correspondence Tracing - Ke Wang Senqiang   (Correct)

.... Work In the context of association rule mining [3] incremental mining [6] maintains the completeness of association rules in the presence of insertion deletion of data, active mining [2] tracks the change of support and confidence over time, emerging pattern mining [8] and contrastset mining [4] identify conditions whose support has changed substantially across two or more groups. In all these works, each rule or pattern is considered in isolation, consequently, changes are variations or consequences of one another. In [13] fundamental rule changes are considered in the context of ....

S. D. Bay and M. J. Pazzani. Detecting change in categorical data: mining contrast sets. In SIGKDD, 1999.


Mining Changes of Classification by Correspondence Tracing - Wang, Zhou, Fu, Yu   (Correct)

.... Work In the context of association rule mining [3] incremental mining [6] maintains the completeness of association rules in the presence of insertion deletion of data, active mining [2] tracks the change of support and confidence over time, emerging pattern mining [8] and contrastset mining [4] identify conditions whose support has changed substantially across two or more groups. In all these works, each rule or pattern is considered in isolation, consequently, changes are variations or consequences of one another. In [13] fundamental rule changes are considered in the context of ....

S. D. Bay and M. J. Pazzani. Detecting change in categorical data: mining contrast sets. In SIGKDD, 1999.


An Investigation Into the Relative Abilities of Three Alternative .. - Butler   (Correct)

....rules varying in size. 48 v List of figures Figure 1: An example decision tree [19] 8 Figure 2: Example of the search tree for two attributes 12 11 1 ,V V A = and 22 21 2 ,V V A = [13] . 16 Figure 3: The CRISP DM reference model consists of six phases [46] 23 Figure 4: Distribution of ....

....models that learn though training and resemble biological neural networks in structure. 5 . Rule induction: The extraction of useful if then rules from data based on statistical significance. This includes association rules [12] which are useful for market basket analysis and contrast sets [13, 14]. Association rules are discussed further in section 2.4 and contrast sets in section 2.5. Nearest neighbours: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset. Genetic algorithms: ....

[Article contains additional citation context not shown here]

Bay, S.D. and M.J. Pazzani. Detecting Change in Categorical Data: Mining Contrast Sets. in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999. 57


Data Mining At The Interface Of Computer Science And Statistics - Smyth (2001)   (1 citation)  (Correct)

.... et al. 2000) 3 Finding patterns (data mining) rather than global models (statistics) examples of pattern finding algorithms include association rule algorithms [AIS93] sequential association algorithms [MTI95] rule induction algorithms [WB86, SG92, FF99] and contrast set al..gorithms [BP99]. These pattern finding algorithms differ from more conventional statistical modeling in that they do not attempt to cover all of the observed data, but instead focus in a datadriven manner on local pockets of information. 4 The engineering of scale, namely, the data engineering aspects of ....

Bay, S. and Pazzani, M. (1999) Detecting change in categorical data: mining contrast sets, in Proceedings of the Fifth ACM International Conference on Knowledge Discovery and Data Mining, New York, NY: ACM Press, 302--305.


Multivariate Discretization for Set Mining - Bay (2000)   (1 citation)  (Correct)

....algorithms can be formulated as set mining such as classification rules (e.g. Liu et al. 1998; Quinlan, 1993; Cohen, 1995) where the goal is to find sets of attribute value Received xxx Revised xxx Accepted xxx 2 Stephen D. Bay (A V) pairs with high predictive power, or contrast set mining (Bay Pazzani, 1999; Bay Pazzani, to appear) where the goal is to find all sets that represent large differences in the probability distributions of two or more groups. There has been much work devoted to speeding up search in set mining (Bayardo, 1998; Webb, 1995; Narendra Fukunaga, 1977) and there are many ....

....the null hypothesis H 0 is that F x = F y and the alternate hypothesis is that the two distributions are different F x 6= F y . In this section, we review past approaches and discuss why they are inappropriate for our application. We argue for a new test based on recent work in contrast set mining (Bay Pazzani, 1999; Bay Pazzani, to appear) With a single dimension, one can use the Kolmogorov Smirnov (K S) two sample test or the Wald Wolfowitz (W W) runs test (Conover, 1971) to check for differences. These methods sort the examples and compute statistics based on the ranks of sorted members in the list. ....

[Article contains additional citation context not shown here]

Bay SD, & Pazzani MJ (1999) Detecting change in categorical data: Mining contrast sets. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 302--306.


Detecting Group Differences: Mining Contrast Sets - Bay, Pazzani   (4 citations)  Self-citation (Bay Pazzani)   (Correct)

....The above equation takes the maximum over all 2 G extreme points and thus is a maximum of the feasible region.2 We expect to compare only a small number of groups, say G 10, so that the exponential number of extreme points we must evaluate is small. If G is large we can use 2 bounds (Bay Pazzani, 1999) that can be found in linear time. 3.4.3 Interest Based Pruning The previous pruning methods only eliminated deviations that could not meet the effect size or statistical significance criteria. In this section, we present pruning methods that may eliminate contrast sets that are deviations but ....

....pairs which occur frequently in many records. These two factors combine to result in many long and frequent itemsets. The Adult and Mushroom datasets are available from the UCI Repository of Machine Learning Databases (Blake Merz, 1998) The IPUMS data is available from the UCI KDD Archive (Bay, 1999). 4 Note that PUMS data is based on cluster samples, i.e. samples are made of households or dwellings from which there may be multiple individuals. Individuals from the same household are no longer independent and thus we violate the independence assumption. We ignore this as we are using the ....

Bay, S. D., & Pazzani, M. J. (1999). Detecting change in categorical data: Mining contrast sets. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 302--306).


Characterizing Model Performance in the Feature Space - Bay, Pazzani   Self-citation (Bay Pazzani)   (Correct)

....feature space with the goal of maximizing predictive accuracy. We implemented this simple framework (Bay Pazzani, 2000) and tried two different algorithms as meta learners: We used C5, a standard decision tree algorithm for classification, and we compared it with STUCCO, a contrast set miner (Bay Pazzani, 1999). These two algorithms differ in three important ways: First, C5 is discriminative algorithm whereas STUCCO is a characteristic or informative approach (Rubinstein Hastie, 1997) 1 . Second, C5 is incomplete as it uses heuristic search while STUCCO is complete. Finally, C5 produces an unordered ....

Bay, S. D., & Pazzani, M. J. (1999). Detecting change in categorical data: Mining contrast sets. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 302--306).


Discovering and Describing Category Differences: What makes a.. - Bay, Pazzani (2000)   (1 citation)  Self-citation (Bay Pazzani)   (Correct)

....the forms can be made equivalent, the difference is that the X that optimizes maximizes Equation 1 is not necessarily the same as the X that is best for Equation 2. We now describe two algorithms representative of the approaches. C5 (Quinlan, 1993) which is a discriminative approach and STUCCO (Bay Pazzani, 1999) which is a characteristic approach. A Discriminative Approach: C5 A discriminative approach to distinguishing two or more groups from each other is to use a rule learner or decision tree to learn a classification strategy. In this paper, we use the program C5 which is an updated version of C4.5 ....

....The rules contain every term that appears in nodes along the path. C5 then tests each term that appears in a rule and removes terms that offer no predictive benefit. A Characteristic Approach: STUCCO Here we briefly review the STUCCO algorithm for mining contrast sets. The reader is directed to (Bay Pazzani, 1999, 1999b) for a more detailed description. STUCCO is a complete mining algorithm that searches for contrast sets, conjunctions of attribute value pairs, that have substantially different probabilities across several distributions or groups. The goal is to find contrast sets where the value of Equation 2 ....

[Article contains additional citation context not shown here]

Bay, S. D., & Pazzani, M. J. (1999). Detecting change in categorical data: Mining contrast sets. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 302--306.


Multivariate Discretization of Continuous Variables for Set Mining - Bay (2000)   (2 citations)  Self-citation (Bay)   (Correct)

....a fundamental operation of data mining. In addition to association rule mining, many other large classes of algorithms can be formulated as set mining such as classification rules [14] where the goal is to find sets of attribute value (A V) pairs with high predictive power, or contrast set mining [4, 5] where the goal is to find sets that represent large differences in the probability distributions of two or more groups. There has been much work devoted to speeding up search in set mining [6, 19] and there are many efficient algorithms when all of the data is discrete or categorical. The ....

....the null hypothesis H0 is that Fx = Fy and the alternate hypothesis is that the two distributions are different Fx 6= Fy . In this section, we review past approaches and discuss why they are inappropriate for our application. We argue for a new test based on recent work in contrast set mining [4, 5]. With a single dimension, we can use the Kolmogorov Smirnov (K S) two sample test or the Wald Wolfowitz (W W) runs test [7] to check for differences. These methods sort the examples and compute statistics based on the ranks of sorted members in the list. For example, the K S test looks at the ....

[Article contains additional citation context not shown here]

S. D. Bay and M. J. Pazzani. Detecting change in categorical data: Mining contrast sets. In Proc. of the 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 302--306, 1999.


Mining Changes of Classification by Correspondence Tracing - Wang, Zhou, Fu, Yu   (Correct)

No context found.

S. D. Bay and M. J. Pazzani. Detecting change in categorical data: mining contrast sets. In SIGKDD, 1999.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC