Results 1 - 10
of
32
Mining co-location patterns with rare events from spatial data sets
- Geoinformatica
"... Abstract A co-location pattern is a group of spatial features/events that are frequently co-located in the same region. For example, human cases of West Nile Virus often occur in regions with poor mosquito control and the presence of birds. For colocation pattern mining, previous studies often empha ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
(Show Context)
Abstract A co-location pattern is a group of spatial features/events that are frequently co-located in the same region. For example, human cases of West Nile Virus often occur in regions with poor mosquito control and the presence of birds. For colocation pattern mining, previous studies often emphasize the equal participation of every spatial feature. As a result, interesting patterns involving events with substantially different frequency cannot be captured. In this paper, we address the problem of mining co-location patterns with rare spatial features. Specifically, we first propose a new measure called the maximal participation ratio (maxPR) and show that a co-location pattern with a relatively high maxPR value corresponds to a colocation pattern containing rare spatial events. Furthermore, we identify a weak monotonicity property of the maxPR measure. This property can help to develop an efficient algorithm to mine patterns with high maxPR values. As demonstrated
Mining spatio-temporal association rules, sources, sinks, stationary regions and thoroughfares in object mobility databases
- of Lecture Notes in Computer Science
, 2006
"... Abstract. As mobile devices proliferate and networks become more locationaware, the corresponding growth in spatio-temporal data will demand analysis techniques to mine patterns that take into account the semantics of such data. Association Rule Mining has been one of the more extensively studied da ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
(Show Context)
Abstract. As mobile devices proliferate and networks become more locationaware, the corresponding growth in spatio-temporal data will demand analysis techniques to mine patterns that take into account the semantics of such data. Association Rule Mining has been one of the more extensively studied data mining techniques, but it considers discrete transactional data (supermarket or sequential). Most attempts to apply this technique to spatial-temporal domains maps the data to transactions, thus losing the spatio-temporal characteristics. We provide a comprehensive definition of spatio-temporal association rules (STARs) that describe how objects move between regions over time. We define support in the spatio-temporal domain to effectively deal with the semantics of such data. We also introduce other patterns that are useful for mobility data; stationary regions and high traffic regions. The latter consists of sources, sinks and thoroughfares. These patterns describe important temporal characteristics of regions and we show that they can be considered as special STARs. We provide efficient algorithms to find these patterns by exploiting several pruning properties 1. 1
Complex Spatial Relationships
- In Proc. of the 3rd IEEE International Conference on Data Mining (ICDM
, 2003
"... This paper describes the need for mining complex relationships in spatial data. Complex relationships are defined as those involving two or more of: multi-feature colocation, self-colocation, one-to-many relationships, self-exclusion and multi-feature exclusion. We demonstrate that even in the minin ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
(Show Context)
This paper describes the need for mining complex relationships in spatial data. Complex relationships are defined as those involving two or more of: multi-feature colocation, self-colocation, one-to-many relationships, self-exclusion and multi-feature exclusion. We demonstrate that even in the mining of simple relationships, knowledge of complex relationships is necessary to accurately calculate the significance of results. We implement a representation of spatial data such that it contains known `weak-monotonic' properties, which are exploited for the efficient mining of complex relationships, and discuss the strengths and limitations of this representation.
High-Confidence Rule Mining for Microarray Analysis
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
, 2007
"... Abstract—We present an association rule mining method for mining high-confidence rules, which describe interesting gene relationships from microarray data sets. Microarray data sets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
(Show Context)
Abstract—We present an association rule mining method for mining high-confidence rules, which describe interesting gene relationships from microarray data sets. Microarray data sets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimized for sparse data sets. A new family of row-enumeration rule mining algorithms has emerged to facilitate mining in dense data sets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MAXCONF, to mine high-confidence rules from microarray data. MAXCONF is a support-free algorithm that directly uses the confidence measure to effectively prune the search space. Experiments on three microarray data sets show that MAXCONF outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach—the rules discovered by MAXCONF are substantially more interesting and meaningful compared with support-based methods. Index Terms—Data mining, association rules, high-confidence rule mining, microarray analysis. Ç 1
Parameter-free spatial data mining using MDL
- In 5th International Conference on Data Mining (ICDM
, 2005
"... Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and feature co-occurrence patterns, without any parameters. In particular, we employ the Minimum Description ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and feature co-occurrence patterns, without any parameters. In particular, we employ the Minimum Description Length (MDL) principle coupled with a natural way of compressing regions. This defines what “good” means: a feature co-occurrence pattern is good, if it helps us better compress the set of locations for these features. Conversely, a spatial correlation is good, if it helps us better compress the set of features in the corresponding region. Our approach is scalable for large datasets (both number of locations and of features). We evaluate our method on both real and synthetic datasets. 1
Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol
"... Crisis-affected populations are often able to maintain digital communications but in a sudden-onset crisis any aid organizations will have the least free resources to process such communications. Information that aid agencies can actually act on, ‘actionable ’ information, will be sparse so there is ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Crisis-affected populations are often able to maintain digital communications but in a sudden-onset crisis any aid organizations will have the least free resources to process such communications. Information that aid agencies can actually act on, ‘actionable ’ information, will be sparse so there is great potential to (semi)automatically identify actionable communications. However, there are hurdles as the languages spoken will often be underresourced, have orthographic variation, and the precise definition of ‘actionable ’ will be response-specific and evolving. We present a novel system that addresses this, drawing on 40,000 emergency text messages sent in Haiti following the January 12, 2010 earthquake, predominantly in Haitian Kreyol. We show that keyword/ngram-based models using streaming MaxEnt achieve up to F=0.21 accuracy. Further, we find current state-ofthe-art subword models increase this substantially to F=0.33 accuracy, while modeling the spatial, temporal, topic and source contexts of the messages can increase this to a very accurate F=0.86 over direct text messages and F=0.90-0.97 over social media, making it a viable strategy for message prioritization. 1
Rule discovery and probabilistic modeling for onomastic data
- In PKDD
, 2003
"... Abstract. The naming of natural features, such as hills, lakes, springs, meadows etc., provides a wealth of linguistic information; the study of the names and naming systems is called onomastics. We consider a data set containing all names and locations of about 58,000 lakes in Finland. Using comput ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. The naming of natural features, such as hills, lakes, springs, meadows etc., provides a wealth of linguistic information; the study of the names and naming systems is called onomastics. We consider a data set containing all names and locations of about 58,000 lakes in Finland. Using computational techniques, we address two major onomastic themes. First, we address the existence of local dependencies or repulsion between occurrences of names. For this, we derive a simple form of spatial association rules. The results partially validate and partially contradict results obtained by traditional onomastic techniques. Second, we consider the existence of relatively homogeneous spatial regions with respect to the distributions of place names. Using mixture modeling, we conduct a global analysis of the data set. The clusterings of regions are spatially connected, and correspond quite well with the results obtained by other techniques; there are, however, interesting differences with previous hypotheses. 1
k-STARs: Sequences of Spatio-Temporal Association Rules
, 2006
"... A Spatio-Temporal Association Rule (STAR) describes how objects move between regions over time. Since they describe only a single movement between two regions, it is very difficult to see larger patterns in the dataset by considering only the set of STARs. It is especially difficult on complex datas ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
A Spatio-Temporal Association Rule (STAR) describes how objects move between regions over time. Since they describe only a single movement between two regions, it is very difficult to see larger patterns in the dataset by considering only the set of STARs. It is especially difficult on complex datasets where the underlying patterns overlap. At best we will miss important patterns- being unable to “see the forest for the trees”, and at worst this can lead to false interpretations. We introduce the k-STAR pattern which describes the sequences of STARs that objects obey. Since a k-STAR captures sequences of object movements it solves these problems. We also allow space and time gaps between successive STARs, as well as supporting ‘replenishable ’ k-STARs so we are able to capture the rich set of patterns that exist in real world data. We define a lattice on the k-STARs that allows the user to drill down and drill up in order to explore the patterns in detail, or view them at a higher level. We introduce two important measures; min-l-support and min-l-confidence that allow us to achieve the above. This paper gives a rigorous theoretical treatment of k-STARs, proving various anti-monotonic and weakly anti-monotonic properties that can be exploited to mine k-STARs efficiently. We describe an algorithm, k-STARMiner, that uses these results to mine the lattice of k-STARs 1. 1
Density Based Co-Location Pattern Discovery
"... Co-location pattern discovery is to find classes of spatial objects that are frequently located together. For example, if two categories of businesses often locate together, they might be identified as a co-location pattern; if several biologic species frequently live in nearby places, they might be ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Co-location pattern discovery is to find classes of spatial objects that are frequently located together. For example, if two categories of businesses often locate together, they might be identified as a co-location pattern; if several biologic species frequently live in nearby places, they might be a co-location pattern. Most existing co-location pattern discovery methods are generate-and-test methods, that is, generate candidates, and test each candidate to determine whether it is a co-location pattern. In the test step, we identify instances of a candidate to obtain its prevalence. In general, instance identification is very costly. In order to reduce the computational cost of identifying instances, we propose a density based approach. We divide objects into partitions and identifying instances in dense partitions first. A dynamic upper bound of the prevalence for a candidate is maintained. If the current upper bound becomes less than a threshold, we stop identifying its instances in the remaining partitions. We prove that our approach is complete and correct in finding co-location patterns. Experimental results on real data sets show that our method outperforms a traditional approach.
Mining Viewpoint Patterns in Image Databases
- Proceeding of the 9 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2003
"... The increasing number of image repositories has made image mining an important task because of its potential in discovering useful image patterns from a large set of images. In this paper, we introduce the notion of viewpoint patterns for image databases. Viewpoint patterns refer to patterns that c ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
The increasing number of image repositories has made image mining an important task because of its potential in discovering useful image patterns from a large set of images. In this paper, we introduce the notion of viewpoint patterns for image databases. Viewpoint patterns refer to patterns that capture the invariant relationships of one object from the point of view of another object. These patterns are unique and significant in images because the absolute positional information of objects for most images is not important, but rather, it is the relative distance and orientation of the objects from each other that is meaningful. We design a scalable and efficient algorithm to discover such viewpoint patterns. Experiments results on various image sets demonstrate that viewpoint patterns are meaningful and interesting to human users.