Results 1  10
of
31
Local probabilistic models for link prediction
 In ICDM
, 2007
"... One of the core tasks in social network analysis is to predict the formation of links (i.e. various types of relationships) over time. Previous research has generally represented the social network in the form of a graph and has leveraged topological and semantic measures of similarity between tw ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
(Show Context)
One of the core tasks in social network analysis is to predict the formation of links (i.e. various types of relationships) over time. Previous research has generally represented the social network in the form of a graph and has leveraged topological and semantic measures of similarity between two nodes to evaluate the probability of link formation. Here we introduce a novel local probabilistic graphical model method that can scale to large graphs to estimate the joint cooccurrence probability of two nodes. Such a probability measure captures information that is not captured by either topological measures or measures of semantic similarity, which are the dominant measures used for link prediction. We demonstrate the effectiveness of the cooccurrence probability feature by using it both in isolation and in combination with other topological and semantic features for predicting coauthorship collaborations on three real datasets. 1
Mining periodic behaviors for moving objects
 In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2010
"... Periodicity is a frequently happening phenomenon for moving objects. Finding periodic behaviors is essential to understanding object movements. However, periodic behaviors could be complicated, involving multiple interleaving periods, partial time span, and spatiotemporal noises and outliers. In ..."
Abstract

Cited by 39 (9 self)
 Add to MetaCart
(Show Context)
Periodicity is a frequently happening phenomenon for moving objects. Finding periodic behaviors is essential to understanding object movements. However, periodic behaviors could be complicated, involving multiple interleaving periods, partial time span, and spatiotemporal noises and outliers. In this paper, we address the problem of mining periodic behaviors for moving objects. It involves two subproblems: how to detect the periods in complex movement, and how to mine periodic movement behaviors. Our main assumption is that the observed movement is generated from multiple interleaved periodic behaviors associated with certain reference locations. Based on this assumption, we propose a twostage algorithm, Periodica, to solve the problem. At the first stage, the notion of reference spot is proposed to capture the reference locations. Through reference spots, multiple periods in the movement can be retrieved using a method that combines Fourier transform and autocorrelation. At the second stage, a probabilistic model is proposed to characterize the periodic behaviors. For a specific period, periodic behaviors are statistically generalized from partial movement sequences through hierarchical clustering. Empirical studies on both synthetic and real data sets demonstrate the effectiveness of our method.
Output Space Sampling for Graph Patterns
, 2009
"... Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms s ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
(Show Context)
Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms suffer when mining databases of large graphs. Another motivation is to obtain a succinct output set that is informative and useful. In the same spirit, researchers also proposed sampling based algorithms that sample the output space of the frequent patterns to obtain representative subgraphs. In this work, we propose a generic sampling framework that is based on MetropolisHastings algorithm to sample the output space of frequent subgraphs. Our experiments on various sampling strategies show the versatility, utility and efficiency of the proposed sampling approach.
Tell me what I need to know: Succinctly summarizing data with itemsets
 In Proc. KDD
, 2011
"... Data analysis is an inherently iterative process. That is, what we know about the data greatly determines our expectations, and hence, what result we would find the most interesting. With this in mind, we introduce a wellfounded approach for succinctly summarizing data with a collection of itemsets ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(Show Context)
Data analysis is an inherently iterative process. That is, what we know about the data greatly determines our expectations, and hence, what result we would find the most interesting. With this in mind, we introduce a wellfounded approach for succinctly summarizing data with a collection of itemsets; using a probabilistic maximum entropy model, we iteratively find the most interesting itemset, and in turn update our model of the data accordingly. As we only include itemsets that are surprising with regard to the current model, the summary is guaranteed to be both descriptive and nonredundant. The algorithm that we present can either mine the topk most interesting itemsets, or use the Bayesian Information Criterion to automatically identify the model containing only the itemsets most important for describing the data. Or, in other words, it will ‘tell you what you need to know’. Experiments on synthetic and benchmark data show that the discovered summaries are succinct, and correctly identify the key patterns in the data. The models they form attain high likelihoods, and inspection shows that they summarize the data well with increasingly specific, yet nonredundant itemsets.
Directly Mining Descriptive Patterns
 SIAM SDM
, 2012
"... Mining small, useful, and highquality sets of patterns has recently become an important topic in data mining. The standard approach is to first mine many candidates, and then to select a good subset. However, the pattern explosion generates such enormous amounts of candidates that by postprocessin ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
Mining small, useful, and highquality sets of patterns has recently become an important topic in data mining. The standard approach is to first mine many candidates, and then to select a good subset. However, the pattern explosion generates such enormous amounts of candidates that by postprocessing it is virtually impossible to analyse dense or large databases in any detail. We introduce Slim, an anytime algorithm for mining highquality sets of itemsets directly from data. We use MDL to identify the best set of itemsets as that set that describes the data best. To approximate this optimum, we iteratively use the current solution to determine what itemset would provide most gain— estimating quality using an accurate heuristic. Without requiring a premined candidate collection, Slim is parameterfree in both theory and practice. Experiments show we mine highquality pattern sets; while evaluating ordersofmagnitude fewer candidates than our closest competitor, Krimp, we obtain much better compression ratios—closely approximating the locallyoptimal strategy. Classification experiments independently verify we characterise data very well. 1
MoveMine: Mining Moving Object Data for Discovery of Animal Movement Patterns
"... With the maturity and wide availability of GPS, wireless, telecommunication, and Web technologies, massive amounts of object movement data have been collected from various moving object targets, such as animals, mobile devices, vehicles, and climate radars. Analyzing such data has deep implications ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
With the maturity and wide availability of GPS, wireless, telecommunication, and Web technologies, massive amounts of object movement data have been collected from various moving object targets, such as animals, mobile devices, vehicles, and climate radars. Analyzing such data has deep implications in many applications, e.g., ecological study, traffic control, mobile communication management, and climatological forecast. In this paper, we focus our study on animal movement data analysis and examine advanced data mining methods for discovery of various animal movement patterns. In particular, we introduce a moving object data mining system, MoveMine, which integrates multiple data mining functions, including sophisticated pattern mining and trajectory analysis. In this system, two interesting moving object pattern mining functions are newly developed: (1) periodic behavior mining and (2) swarm pattern mining. For mining periodic behaviors, a reference locationbased method is developed, which first detects the reference locations, discovers the periods in complex movements, and then finds periodic patterns by hierarchical clustering. For mining swarm patterns, an efficient method is developed to uncover flexible moving object clusters by relaxing the popularlyenforced collective movement constraints. In the MoveMine system, a set of commonly used moving object mining functions are built
On Effective Presentation of Graph Patterns: A Structural Representative Approach
 in Proc. 2008 ACM Conf. on Information and Knowledge Management (CIKM'08
, 2008
"... In the past, quite a few fast algorithms have been developed to mine frequent patterns over graph data, with the large spectrum covering many variants of the problem. However, the real bottleneck for knowledge discovery on graphs is neither efficiency nor scalability, but the usability of patterns t ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
In the past, quite a few fast algorithms have been developed to mine frequent patterns over graph data, with the large spectrum covering many variants of the problem. However, the real bottleneck for knowledge discovery on graphs is neither efficiency nor scalability, but the usability of patterns that are mined out. Currently, what the stateofart techniques give is a lengthy list of exact patterns, which are undesirable in the following two aspects: (1) on the micro side, due to various inherent noises or data diversity, exact patterns are usually not too useful in many real applications; and (2) on the macro side, the rigid structural requirement being posed often generates an excessive amount of patterns that are only slightly different from each other, which easily overwhelm the users. In this paper, we study the presentation problem of graph patterns, where structural representatives are deemed as the key mechanism to make the whole strategy effective. As a solution to fill the usability gap, we adopt a twostep smoothingclustering framework, with the first step adding error tolerance to individual patterns (the micro side), and the second step reducing output cardinality by collapsing multiple structurally similar patterns into one representative (the macro side). This novel, integrative approach is never tried in previous studies, which essentially rollsup our attention to a more appropriate level that no longer looks into every minute detail. The above framework is general, which may apply under various settings and incorporate a lot of extensions. Empirical studies indicate that a compact group of informative delegates can be achieved on real datasets and the proposed algorithms are both efficient and scalable.
Effective and Efficient Itemset Pattern Summarization: Regressionbased Approaches
"... In this paper, we propose a set of novel regressionbased approaches to effectively and efficiently summarize frequent itemset patterns. Specifically, we show that the problem of minimizing the restoration error for a set of itemsets based on a probabilistic model corresponds to a nonlinear regress ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a set of novel regressionbased approaches to effectively and efficiently summarize frequent itemset patterns. Specifically, we show that the problem of minimizing the restoration error for a set of itemsets based on a probabilistic model corresponds to a nonlinear regression problem. We show that under certain conditions, we can transform the nonlinear regression problem to a linear regression problem. We propose two new methods, kregression and treeregression, to partition the entire collection of frequent itemsets in order to minimize the restoration error. The Kregression approach, employing a Kmeans type clustering method, guarantees that the total restoration error achieves a local minimum. The treeregression approach employs a decisiontree type of topdown partitionprocess. Inaddition,wediscussalternativestoestimate the frequency for the collection of itemsets being covered by the k representative itemsets. The experimental evaluation on both realandsyntheticdatasetsdemonstratesthatourapproachessignificantly improve the summarization performance in terms of bothaccuracy (restorationerror),andcomputational cost.
Efficient Algorithms for the Mining of Constrained Frequent Patterns from Uncertain Data ABSTRACT
"... Mining of frequent patterns is one of the popular knowledge discovery and data mining (KDD) tasks. It also plays an essential role in the mining of many other patterns such as correlation, sequences, and association rules. Hence, it has been the subject of numerous studies since its introduction. Mo ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Mining of frequent patterns is one of the popular knowledge discovery and data mining (KDD) tasks. It also plays an essential role in the mining of many other patterns such as correlation, sequences, and association rules. Hence, it has been the subject of numerous studies since its introduction. Most of these studies find all the frequent patterns from collection of precise data, in which the items within each datum or transaction are definitely known and precise. However, there are many reallife situations in which the user is interested in only some tiny portions of these frequent patterns. Finding all frequent patterns would then be redundant and waste lots of computation. This calls for constrained mining, which aims to find only those frequent patterns that are interesting to the user. Moreover, there are also many reallife situations in which the data are uncertain. This calls for uncertain data mining. In this article, we propose algorithms to efficiently find constrained frequent patterns from collections of uncertain data. 1.
Discovering coherent value bicliques in genetic interaction data
 In Proceedings of 9th International Workshop on Data Mining in Bioinformatics (BIOKDD’10
, 2000
"... Genetic Interaction (GI) data provides a means for exploring the structure and function of pathways in a cell. Coherent value bicliques (submatrices) in GI data represents functionally similar gene modules or protein complexes. However, no systematic approach has been proposed for exhaustively enume ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Genetic Interaction (GI) data provides a means for exploring the structure and function of pathways in a cell. Coherent value bicliques (submatrices) in GI data represents functionally similar gene modules or protein complexes. However, no systematic approach has been proposed for exhaustively enumerating all coherent value submatrices in such data sets, which is the problem addressed in this paper. Using a monotonic range measure to capture the coherence of values in a submatrix of an input data matrix, we propose a twostep Aprioribased algorithm for discovering all nearly constant value submatrices, referred to as Range Constrained Blocks. By systematic evaluation on an extensive genetic interaction data set, we show that the coherent value submatrices represent groups of genes that are functionally related than the submatrices with diverse values. We also show that our approach can exhaustively find all the submatrices with a range less than a given threshold, while the other competing approaches can not find all such submatrices. 1.