27 citations found. Retrieving documents...
V. Ganti, J. Gehrke, and R. Ramakrishnan. A framework for measuring changes in data characteristics. In Proceedings of 18th Symposium on Principles of Database Systems, pages 126--137. ACM Press, 1999.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Mining Changes of Classification by Correspondence Tracing - Ke Wang Senqiang   (Correct)

....some extent, i.e. follow a similar splitting in the decision tree construction. This restriction makes it less likely to find important changes. For example, important attributes often occur at top levels of the decision tree, and if such attributes change, the method in [11] cannot be used. In [9], the change between two classifiers is measured by the amount of work required to transform them into some common specialization. In the real life, the human user hardly thinks of changes in terms of such a common specialization. We believe that a new classifier should best capture the ....

....the classification accuracy. The drifting environment [18, 10] concerns with producing a classifier by assigning more weight to recently arrived data. 5] exploits the user knowledge to construct an understandable classifier. None of these works addresses the change mining problem studied here. [9] presents a framework for measuring changes in two models such as two classifiers. A model is represented by a partition of the data space that summarizes the data. The change between two models is measured by the amount of work required to transform the two models into the common specialization ....

V. Ganti, J. Gehrke, and R. Ramakrishnan. A framework for measuring changes in data characteristics. In PODS, 1999.


Mining Changes of Classification by Correspondence Tracing - Wang, Zhou, Fu, Yu   (Correct)

....some extent, i.e. follow a similar splitting in the decision tree construction. This restriction makes it less likely to find important changes. For example, important attributes often occur at top levels of the decision tree, and if such attributes change, the method in [11] cannot be used. In [9], the change between two classifiers is measured by the amount of work required to transform them into some common specialization. In the real life, the human user hardly thinks of changes in terms of such a common specialization. We believe that a new classifier should best capture the ....

....the classification ac curacy. The drifting environment [18, 10] concerns with producing a classifier by assigning more weight to recently arrived data. 5] exploits the user knowledge to construct an understandable classifier. None of these works addresses the change mining problem studied here. [9] presents a framework for measuring changes in two models such as two classifiers. A model is represented by a partition of the data space that summarizes the data. The change between two models is measured by the amount of work required to transform the two models into the common specialization ....

V. Ganti, J. Gehrke, and R. Ramakrishnan. A framework for measuring changes in data characteristics. In PODS, 1999.


Similarity Testing Between Heterogeneous Basket Datasets - Li, al. (2002)   (Correct)

.... different applications, including the work on time series and queries [13; 19; 30; 34] dataset similarities [2; 20; 27] attribute similarities [11; 12; 35] and database similarities [32; 33; 38] Also, a general framework for comparing database objects with a certain property has been proposed [18]. Similarity measures between homogeneous datasets can be used for deviation detection [18] data quality mining [25] distributed mining [33] and trend analysis. When comparing two heterogeneous basket databases for similarity, difficulty lies in the fact that there may be no known correspondence ....

.... similarities [2; 20; 27] attribute similarities [11; 12; 35] and database similarities [32; 33; 38] Also, a general framework for comparing database objects with a certain property has been proposed [18] Similarity measures between homogeneous datasets can be used for deviation detection [18], data quality mining [25] distributed mining [33] and trend analysis. When comparing two heterogeneous basket databases for similarity, difficulty lies in the fact that there may be no known correspondence between the two sets of attributes. This implies that similarity measures that are ....

[Article contains additional citation context not shown here]

V. Ganti, J. Gehrke, and R. Ramakrishnan. A framework for measuring changes in data characteristics. In Proceedings of 18th Symposium on Principles of Database Systems, pages 126--137. ACM Press, 1999.


Recent Progress on Selected Topics in Database Research - Chen, Li, al.   (Correct)

....the current status and what are the relatively stable factors over time Clearly, to answer the above queries, we have to examine the changes. Some previous works also involve change detection. For example, the emerging patterns [27] characterize the changes from one data set to the other. In [28], Ganti et al. propose methods to measure the di#erences of the induced models in data sets. Incremental mining studies how to update the models patterns by factoring in the incremental part of data. However, mining data streams requires online and dynamic detection and summarization of ....

V. Ganti, J. Gehrke, and R. Ramakrishnan. A framework for measuring changes in data characteristics. In Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 31 - June 2, 1999.


Visualizing Changes in the Structure of Data for.. - Pampalk, Goebl, Widmer (2003)   (Correct)

....understand changes in evolving data streams using di#erential kernel density estimation with various window sizes. In evolving data streams the same data spaces are used at di#erent points in time while the data items change. Other approaches analyzing the changes in data characteristics include [10] where the focus is on measuring the e#ects on data mining models instead of intuitively visualizing changes. 3. SELF ORGANIZING MAPS The Self Organizing Map (SOM) 17, 19] an unsupervised neural network, has successfully been applied in exploratory data analysis [13] with applications in ....

V. Ganti, J. Gehrke, R. Ramakrishnan, and W.-Y. Loh. A Framework for Measuring Changes in Data Characteristics. In Proceedings of the 18th Symposium on Principles of Database Systems, 1999.


When to Update the Sequential Patterns of Stream Data - Zheng, Xu, Ma (2003)   (Correct)

....[8] the symmetric difference was used to measure the difference of association rules. But Lee and Cheung only considered the difference of association rules, and did not consider that the performance of increasingly updating algorithms will change with the size of added transactions. Ganti et al. [9,10] focused on the incremental stream data mining model maintenance and change detection under block evolution. However, they also didn t consider the performance of incremental data mining algorithms for the evolving data. Obviously, with the increment of the size of incremental windows, the ....

V. Ganti, J. Gehrke, R. Ramakrishnan, and W.-Y. Loh, "A frame work for measuring changes in data characteristics," In Proceedings of the 18th Symposium on Principles of Database Systems, 1999.


Fuzzy Data Mining for Discovering Changes in Association Rules.. - Au, Chan   (Correct)

....significantly in different time periods (stable and semi stable rules) and the rules that indicate some systematic trends (trend rules) based on a number of statistical tests, is not developed for mining and predicting changes in rules over time. Furthermore, a framework has been proposed in [11] for measuring the difference between two datasets (called deviation) It makes use of a data mining algorithm to build two models, one from each dataset. The difference between the models is used as the measure of deviation between the underlying datasets. The deviation measure employed in ....

V. Ganti, J. Gehrke, R. Ramakrishnan, and W.-Y. Loh, "A Framework for Measuring Changes in Data Characteristics," in Proc. of the 18th ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems, Philadelphia, PA, 1999, pp. 126-137.


When to Update the Sequential Patterns of Stream Data - Zheng, Xu, Ma (2002)   (Correct)

....the symmetric difference was used to measure the difference of association rules. But Lee and Cheung only considered the difference of association rules, and did not consider that the performance of increasingly updating algorithms will change with the size of added transactions. Ganti et al. [15,16] focused on the incremental stream data mining model maintenance and change detection under block evolution. However, they also didnt consider the performance of incremental data mining algorithms for the evolving data. Obviously, with the increment of the size of incremental windows, the ....

....search in sequences of events, which is based on the use of random projections. Das et al. 12] also introduced the notion of an external measure between attribute A and attribute B, defined by looking at the values of probe functions on sub relations defined by A and B. Ganti et al. [15] studied the incremental stream data mining model maintenance and change detection under block evolution. They adopted the FOCUS framework [16] for change detection, which measures the deviation between two datasets first by the class of decision tree models and then by the class of frequent ....

V. Ganti, J. Gehrke, R. Ramakrishnan, and W.-Y. Loh, A frame work for measuring changes in data characteristics, In Proceedings of the 18th Symposium on Principles of Database Systems, 1999.


Discovering the Set of Fundamental Rule Changes - Liu, Hsu, Ma (2001)   (4 citations)  (Correct)

....The user can then focus his her attention on those important interesting aspects of changes, and selectively view those less important or explainable changes. 2. RELATED WORK Mining or learning in a changing environment has been studied in both data mining and machine learning. In data mining, [1, 5, 8] address the problem of monitoring the support and confidence of association rules. Given an association rule, their techniques track the support and confidence variations of the rule over time. These techniques basically belong to the simple approach mentioned in Section 1. None of them aims to ....

....rule base. Ups and downs in support or confidence over time (called history) are represented and defined using shape operators. The user can then query the rule base by specifying some history specifications. Clearly, this is different from our work, as it does not mine fundamental rule changes. [8] presents a general framework for measuring changes or differences in two models (e.g. two sets of association rules from two datasets) The framework does not mine fundamental changes. 4, 9] study the maintenance of discovered association rules. The techniques aim to incrementally update the ....

. Ganti, V., Gehrke, J., and Ramakrishnan, R. "A framework for measuring changes in data characteristics." POPS-99, 1999.


Efficiently Determine the Starting Sample Size for.. - Gu, Liu, Hu, Liu   (Correct)

....a sample, in which the probability of a tuple being selected is proportional to the frequency with which it is required to answer queries (exactly) However, in both [2] and [5] how to decide a proper sample size is not mentioned. The work that we find most similar to ours is that given in [4], where Ganti et al. introduce a measure to quantify the di#erence between two data sets in terms of the models built by a given data mining algorithm. Our measure is di#erent as it is based on statistical information divergence of the data sets. Although [4] also addresses the issue of building ....

....most similar to ours is that given in [4] where Ganti et al. introduce a measure to quantify the di#erence between two data sets in terms of the models built by a given data mining algorithm. Our measure is di#erent as it is based on statistical information divergence of the data sets. Although [4] also addresses the issue of building models using random samples and shows that bigger sample sizes produce better models, it does not study how to determine a proper sample size. Our method can be a useful complement to the existing techniques. The proposed SOSS also has a good property: it ....

V. Ganti, J. Gehrke, R. Ramakrishnan, and W.Y. Loh. A framework for measuring changes in data characteristics. In Proceedings of PODS'99, 1999.


Caching for Multi-dimensional Data Mining Queries - Nag, Deshpande, DeWitt   (Correct)

.... Q2 Q1 Q3 Store Date Count 1 Group 1 Group 2 Count 2 Group M Count M Item N Item 2 Item 1 . Figure 1: A Chunked Cache Figure 2: Structure of an Itemset association rules, but one can run a trend detection algorithm [4] on top of these sets of discovered rules to find if a definite temporal pattern exists. For example, the following query will find out if a trend exists in the day to day buying patterns of the previous month. SELECT trend(assoc rules) FROM (SELECT d.day, rules(s. pid ) AS assoc rules FROM ....

V. Ganti, J. Gehrke, R. Ramakrishnan, W. Loh. "A Framework for Measuring Changes in Data Characteristics". PODS Conf., 1999.


Detecting Group Differences: Mining Contrast Sets - Bay, Pazzani   (4 citations)  (Correct)

....changes in a single distribution as it varies through time. Thus a query such as How does group A differ from B has no meaning in their data model as different groups (distributions) do not exist. Conversely, in our model, asking for what has changed without reference to a group is nonsensical. Ganti, Gehrke, Ramakrishnan, and Loh (1999) work on detecting the differences between datasets by examining differences between models induced on the data. They represent each model with a structure component which identifies regions in the feature space and a measure component which summarizes the data mapped to the region (e.g. fraction ....

Ganti, V., Gehrke, J. E., Ramakrishnan, R., & Loh, W. (1999). A framework for measuring changes in data characteristics. Proceedings of Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.


Mining Changes for Real-Life Applications - Liu, Hsu, Han, Xia (2000)   (4 citations)  (Correct)

....learning theory has been focused on generating accurate predictors in a drifting environment [e.g. 14, 5, 7, 17] It does not produce the explicit changes that have occurred. In data mining, 1, 4] addressed the problem of monitoring the support and confidence changes of association rules. [6] gave a theoretical framework for measuring changes. We will discuss these and the other related works in Section 4. 2. Mining Changes in the Decision Tree Model Decision tree construction is one of the important model building techniques. Given a data set with a fixed discrete class attribute, ....

....New decision tree: In this method, we generate a new decision tree using the new data, and then overlay the new decision tree on the old decision tree and compare the intersections of regions. The intersection regions that have conflicting class labels are the changes. This idea was suggested in [6]. 2. Same attribute and best cut: This method modifies the decision tree algorithm so that in generating the new tree with the new data, it uses the same attribute as in the old tree at each step of partitioning, but it does not have to choose the same cut point for the attribute as in the old ....

[Article contains additional citation context not shown here]

Ganti, V., Gehrke, J., and Ramakrishnan, R. "A framework for measuring changes in data characteristics" POPS-99.


DEMON: Mining and Monitoring Evolving Data - Ganti, Gehrke, Ramakrishnan (2000)   (15 citations)  Self-citation (Ganti Gehrke Ramakrishnan)   (Correct)

No context found.

Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan, and Wei-Yin Loh. A framework for measuring changes in data characteristics. In Proceedings of the 18th Symposium on Principles of Database Systems, 1999.


A Framework for Measuring Changes in Data Characteristics - Venkatesh Ganti Johannes (1999)   (23 citations)  Self-citation (Ganti Gehrke Ramakrishnan Loh)   (Correct)

No context found.

Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan, and WeiYin Loh. A framework for measuring changes in data characteristics. http://www.cs.wisc.edu/ vganti/pods99-full.ps, November 1998.


DEMON: Mining and Monitoring Evolving Data - Ganti, Gehrke, Ramakrishnan (1999)   (15 citations)  Self-citation (Ganti Gehrke Ramakrishnan)   (Correct)

....for any dynamically changing database. Assuming 13 our model of systematic block evolution, we now describe some initial results for detecting patterns of similar blocks. To detect a pattern from a sequence of blocks, a notion of similarity between blocks of data is required. In prior work [GGRL99a] we developed a framework for computing an interpretable, statistically qualifiable measure of difference called deviation between two datasets. The deviation quantifies the differences between interesting characteristics in each dataset (as reflected in the data mining models they induce) The ....

....Formally, we say that blocks D 1 and D 2 are M similar at significance level ff (0 ff 1) if ffi M (D 1 ; D 2 ) ff. Note that our notion of similarity is symmetric, but not transitive. The computation of the deviation measure is fast since it requires at most one scan of each dataset. See [GGRL99a] for details. One approach for finding groups of similar data blocks assuming a computable similarity function between blocks is to treat each block as an object. Finding groups of similar blocks now reduces to an instance of the clustering problem. This approach has one drawback: clustering ....

Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan, and Wei-Yin Loh. A framework for measuring changes in data characteristics. In Proceedings of the 18th Symposium on Principles of Database Systems, 1999.


A Framework for Measuring Changes in Data Characteristics - Ganti, Gehrke.. (1998)   (23 citations)  Self-citation (Ganti Gehrke Ramakrishnan Loh)   (Correct)

....dtmodels, and cluster models. Informally, a lits model is the set of frequent itemsets; a dt model is a decision tree; a cluster model is a set of clusters. We assume that the reader is familiar with each of these classes of models. For a formal description, see [3, 8, 38] or the full paper [18]. In this section, we illustrate the concepts and ideas behind the computation of deviation between two datasets first through the class of decision tree models and then through the class of frequent itemsets. In Section 3, we formalize these concepts. 2.1 dt models Let the decision tree ....

....statistical tests to compute the significance sig(d) of the deviation d between two datasets. We use bootstrapping techniques from Statistics [14] to compute F . We omit the details due to space constraints. See the full paper for details of the bootstrapping procedure and the statistical tests [18]. 4 Instantiations In this section, we instantiate the FOCUS framework for litsmodels, dt models, and cluster models. Wherever possible, we analyze the properties of the instantiated deviation functions. 4.1 lits models We first show that the class of lits models exhibits the meetsemilattice ....

[Article contains additional citation context not shown here]

Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan, and WeiYin Loh. A framework for measuring changes in data characteristics. http://www.cs.wisc.edu/ vganti/pods99-full.ps, November 1998.


Association-Based Similarity Testing and Its Applications - Tao Li Department   (Correct)

No context found.

V. Ganti, J. Gehrke, and R. Ramakrishnan. A framework for measuring changes in data characteristics. In Proceedings of 18th Symposium on Principles of Database Systems, pages 126--137. ACM Press, 1999.


March 2002 - Un Vers Ty   (Correct)

No context found.

V. Ganti, J. Gehrke, and R. Ramakrishnan. A framework for measuring changes in data characteristics. In Proceedings of 18th Symposium on Principles of Database Systems, pages 126--137. ACM Press, 1999.


A New Distributed Data Mining Model Based on Similarity - Tao Li Computer (2003)   (1 citation)  (Correct)

No context found.

Ganti, V., Gehrke, J., & Ramakrishnan, R. (1999). A framework for measuring changes in data characteristics. Proceedings of 18th Symposium on Principles of Database Systems (pp. 126--137). ACM Press.


Exploring Similarities Across High-Dimensional - Datasets Karlton Sequeira   (Correct)

No context found.

V. Ganti, J. Gehrke, R. Ramakrishnan, and W. Loh. A framework for measuring changes in data characteristics. In PODS, 1999.


Associative Clustering for Exploring Dependencies.. - Kaski, Nikkilä.. (2005)   (Correct)

No context found.

V. Ganti, J. Gehrke, R. Ramakrishnan, and W.-Y. Loh. A framework for measuring changes in data characteristics. In Proceedings of ACM PODS 1999.


From Learning Metrics towards Dependency Exploration - Kaski   (Correct)

No context found.

V. Ganti, J. Gehrke, R. Ramakrishnan, and W.-Y. Loh. A framework for measuring changes in data characteristics. In Proceedings of ACM PODS 1999.


Online Mining of Changes from Data Streams: - Research Problems And (2003)   (Correct)

No context found.

V. Ganti, J. Gehrke, and R. Ramakrishnan. A framework for measuring changes in data characteristics. In PODS'99, pages 126--137, Philadelphia, PA, May/June 1999.


Mining Changes of Classification by Correspondence Tracing - Wang, Zhou, Fu, Yu   (Correct)

No context found.

V. Ganti, J. Gehrke, and R. Ramakrishnan. A framework for measuring changes in data characteristics. In PODS, 1999.


Mining Data Streams - Jin   (Correct)

No context found.

V. Ganti, J. Gehrke, R. Ramakrishnan, and W. Loh. A framework for measuring changes in data characteristics. In PODS, 1999.


Thesis Proposal - Ruoming Jin Department   (Correct)

No context found.

V. Ganti, J. Gehrke, R. Ramakrishnan, and W. Loh. A framework for measuring changes in data characteristics. In PODS, 1999.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC