Results 1  10
of
67
Discovery of collocation patterns: from visual words to visual phrases
 In CVPR
, 2007
"... A visual word lexicon can be constructed by clustering primitive visual features, and a visual object can be described by a set of visual words. Such a “bagofwords” representation has led to many significant results in various vision tasks including object recognition and categorization. However, ..."
Abstract

Cited by 63 (4 self)
 Add to MetaCart
(Show Context)
A visual word lexicon can be constructed by clustering primitive visual features, and a visual object can be described by a set of visual words. Such a “bagofwords” representation has led to many significant results in various vision tasks including object recognition and categorization. However, in practice, the clustering of primitive visual features tends to result in synonymous visual words that overrepresent visual patterns, as well as polysemous visual words that bring large uncertainties and ambiguities in the representation. This paper aims at generating a higherlevel lexicon, i.e.visual phrase lexicon, whereavisual phrase is a meaningful spatially cooccurrent pattern of visual words. This higherlevel lexicon is much less ambiguous than the lowerlevel one. The contributions of this paper include: (1) a fast and principled solution to the discovery of significant spatial cooccurrent patterns using frequent itemset mining; (2) a pattern summarization method that deals with the compositional uncertainties in visual phrases; and (3) a topdown refinement scheme of the visual word lexicon by feeding back discovered phrases to tune the similarity measure through metric learning. 1.
Mining periodic behaviors for moving objects
 In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2010
"... Periodicity is a frequently happening phenomenon for moving objects. Finding periodic behaviors is essential to understanding object movements. However, periodic behaviors could be complicated, involving multiple interleaving periods, partial time span, and spatiotemporal noises and outliers. In ..."
Abstract

Cited by 39 (9 self)
 Add to MetaCart
(Show Context)
Periodicity is a frequently happening phenomenon for moving objects. Finding periodic behaviors is essential to understanding object movements. However, periodic behaviors could be complicated, involving multiple interleaving periods, partial time span, and spatiotemporal noises and outliers. In this paper, we address the problem of mining periodic behaviors for moving objects. It involves two subproblems: how to detect the periods in complex movement, and how to mine periodic movement behaviors. Our main assumption is that the observed movement is generated from multiple interleaved periodic behaviors associated with certain reference locations. Based on this assumption, we propose a twostage algorithm, Periodica, to solve the problem. At the first stage, the notion of reference spot is proposed to capture the reference locations. Through reference spots, multiple periods in the movement can be retrieved using a method that combines Fourier transform and autocorrelation. At the second stage, a probabilistic model is proposed to characterize the periodic behaviors. For a specific period, periodic behaviors are statistically generalized from partial movement sequences through hierarchical clustering. Empirical studies on both synthetic and real data sets demonstrate the effectiveness of our method.
Extracting redundancyaware topk patterns
 In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2006
"... Observed in many applications, there is a potential need of extracting a small set of frequent patterns having not only high significance but also low redundancy. The significance is usually defined by the context of applications. Previous studies have been concentrating on how to compute topk sign ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
(Show Context)
Observed in many applications, there is a potential need of extracting a small set of frequent patterns having not only high significance but also low redundancy. The significance is usually defined by the context of applications. Previous studies have been concentrating on how to compute topk significant patterns or how to remove redundancy among patterns separately. There is limited work on finding those topk patterns which demonstrate highsignificance and lowredundancy simultaneously. In this paper, we study the problem of extracting redundancyaware topk patterns from a large collection of frequent patterns. We first examine the evaluation functions for measuring the combined significance of a pattern set and propose the MMS (Maximal Marginal Significance) as the problem formulation. The problem is known as NPhard. We further present a greedy algorithm which approximates the optimal solution with performance bound O(log k) (with conditions on redundancy), where k is the number of reported patterns. The direct usage of redundancyaware topk patterns is illustrated through two real applications: disk block prefetch and document theme extraction. Our method can also be applied to processing redundancyaware topk queries in traditional database.
Summarizing Itemset Patterns Using Probabilistic Models
, 2006
"... In this paper, we propose a novel probabilistic approach to summarize frequent itemset patterns. Such techniques are useful for summarization, postprocessing, and enduser interpretation, particularly for problems where the resulting set of patterns are huge. In our approach items in the dataset ar ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a novel probabilistic approach to summarize frequent itemset patterns. Such techniques are useful for summarization, postprocessing, and enduser interpretation, particularly for problems where the resulting set of patterns are huge. In our approach items in the dataset are modeled as random variables. We then construct a Markov Random Fields (MRF) on these variables based on frequent itemsets and their occurrence statistics. The summarization proceeds in a levelwise iterative fashion. Occurrence statistics of itemsets at the lowest level are used to construct an initial MRF. Statistics of itemsets at the next level can then be inferred from the model. We use those patterns whose occurrence can not be accurately inferred from the model to augment the model in an iterative manner, repeating the procedure until all frequent itemsets can be modeled. The resulting MRF model affords a concise and useful representation of the original collection of itemsets. Extensive empirical study on real datasets show that the new approach can effectively summarize a large number of itemsets and typically significantly outperforms extant approaches.
Pattern teams
 Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD06
, 2006
"... Abstract Pattern discovery algorithms typically produce many interesting patterns. In most cases, patterns are reported based on their individual merits, and little attention is given to the interestingness of a pattern in the context of other patterns reported. In this paper, we propose filtering t ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
(Show Context)
Abstract Pattern discovery algorithms typically produce many interesting patterns. In most cases, patterns are reported based on their individual merits, and little attention is given to the interestingness of a pattern in the context of other patterns reported. In this paper, we propose filtering the returned set of patterns based on a number of quality measures for pattern sets. We refer to a small subset of patterns that optimises such a measure as a pattern team. A number of quality measures, both supervised and unsupervised, is proposed. We analyse to what extent each of the measures captures a number of ‘intuitions ’ users may have concerning effective and informative pattern teams. Such intuitions involve qualities such as independence of patterns, low overlap, and combined predictiveness. 1
Output Space Sampling for Graph Patterns
, 2009
"... Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms s ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
(Show Context)
Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms suffer when mining databases of large graphs. Another motivation is to obtain a succinct output set that is informative and useful. In the same spirit, researchers also proposed sampling based algorithms that sample the output space of the frequent patterns to obtain representative subgraphs. In this work, we propose a generic sampling framework that is based on MetropolisHastings algorithm to sample the output space of frequent subgraphs. Our experiments on various sampling strategies show the versatility, utility and efficiency of the proposed sampling approach.
Time series knowledge mining
, 2006
"... An important goal of knowledge discovery is the search for patterns in data that can help explain the underlying process that generated the data. The patterns are required to be new, useful, and understandable to humans. In this work we present a new method for the understandable description of loca ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
An important goal of knowledge discovery is the search for patterns in data that can help explain the underlying process that generated the data. The patterns are required to be new, useful, and understandable to humans. In this work we present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge. The patterns have a hierarchical structure, each level corresponds to a single temporal concept. On the lowest level, intervals are used to represent duration. Overlapping parts of intervals represent coincidence on the next level. Several such blocks of intervals are connected with a partial order relation on the highest level. Each pattern element consists of a semiotic triple to connect syntactic and semantic information with pragmatics. The patterns are very compact, but offer details for each element on demand. In comparison with related approaches, the TSKR is shown to have advantages in robustness, expressivity, and comprehensibility. Efficient algorithms for the discovery of the patterns are proposed. The search for coincidence as well as partial order can be formulated as variants of the well known frequent itemset problem. One of the best known algorithms for this problem is therefore adapted for our purposes. Human interaction is used during the mining to analyze and validate partial results as early as possible and guide further processing steps. The efficacy of the methods is demonstrated using several data sets. In an application to sports medicine the results were recognized as valid and useful by an expert of the field.
Tell me what I need to know: Succinctly summarizing data with itemsets
 In Proc. KDD
, 2011
"... Data analysis is an inherently iterative process. That is, what we know about the data greatly determines our expectations, and hence, what result we would find the most interesting. With this in mind, we introduce a wellfounded approach for succinctly summarizing data with a collection of itemsets ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(Show Context)
Data analysis is an inherently iterative process. That is, what we know about the data greatly determines our expectations, and hence, what result we would find the most interesting. With this in mind, we introduce a wellfounded approach for succinctly summarizing data with a collection of itemsets; using a probabilistic maximum entropy model, we iteratively find the most interesting itemset, and in turn update our model of the data accordingly. As we only include itemsets that are surprising with regard to the current model, the summary is guaranteed to be both descriptive and nonredundant. The algorithm that we present can either mine the topk most interesting itemsets, or use the Bayesian Information Criterion to automatically identify the model containing only the itemsets most important for describing the data. Or, in other words, it will ‘tell you what you need to know’. Experiments on synthetic and benchmark data show that the discovered summaries are succinct, and correctly identify the key patterns in the data. The models they form attain high likelihoods, and inspection shows that they summarize the data well with increasingly specific, yet nonredundant itemsets.
Mobility Performance of
 MacrocellAssisted Small Cells in Manhattan Model,” Vehicular Technology Conference (VTC Spring), 2014 IEEE 79th
, 2014
"... Recent research efforts have made notable progress in improving the performance of (exhaustive) maximal clique enumeration (MCE). However, existing algorithms still suffer from exploring the huge search space of MCE. Furthermore, their results are often undesirable as many of the returned maximal c ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Recent research efforts have made notable progress in improving the performance of (exhaustive) maximal clique enumeration (MCE). However, existing algorithms still suffer from exploring the huge search space of MCE. Furthermore, their results are often undesirable as many of the returned maximal cliques have large overlapping parts. This redundancy leads to problems in both computational efficiency and usefulness of MCE. In this paper, we aim at providing a concise and complete summary of the set of maximal cliques, which is useful to many applications. We propose the notion of τvisible MCE to achieve this goal and design algorithms to realize the notion. Based on the refined output space, we further consider applications including an efficient computation of the topk results with diversity and an interactive clique exploration process. Our experimental results demonstrate that our approach is capable of producing output of high usability and our algorithms achieve superior efficiency over classic MCE algorithms.
Directly Mining Descriptive Patterns
 SIAM SDM
, 2012
"... Mining small, useful, and highquality sets of patterns has recently become an important topic in data mining. The standard approach is to first mine many candidates, and then to select a good subset. However, the pattern explosion generates such enormous amounts of candidates that by postprocessin ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
Mining small, useful, and highquality sets of patterns has recently become an important topic in data mining. The standard approach is to first mine many candidates, and then to select a good subset. However, the pattern explosion generates such enormous amounts of candidates that by postprocessing it is virtually impossible to analyse dense or large databases in any detail. We introduce Slim, an anytime algorithm for mining highquality sets of itemsets directly from data. We use MDL to identify the best set of itemsets as that set that describes the data best. To approximate this optimum, we iteratively use the current solution to determine what itemset would provide most gain— estimating quality using an accurate heuristic. Without requiring a premined candidate collection, Slim is parameterfree in both theory and practice. Experiments show we mine highquality pattern sets; while evaluating ordersofmagnitude fewer candidates than our closest competitor, Krimp, we obtain much better compression ratios—closely approximating the locallyoptimal strategy. Classification experiments independently verify we characterise data very well. 1