Results 1  10
of
17
Fast construction of kNearest Neighbor Graphs for Point Clouds
"... Abstract—We present a parallel algorithm for knearest neighbor graph construction that uses Morton ordering. Experiments show that our approach has the following advantages over existing methods: (1) Faster construction of knearest neighbor graphs in practice on multicore machines. (2) Less space ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We present a parallel algorithm for knearest neighbor graph construction that uses Morton ordering. Experiments show that our approach has the following advantages over existing methods: (1) Faster construction of knearest neighbor graphs in practice on multicore machines. (2) Less space usage. (3) Better cache efficiency. (4) Ability to handle large data sets. (5) Ease of parallelization and implementation. If the point set has a bounded expansion constant, our algorithm requires one comparison based parallel sort of points according to Morton order plus near linear additional steps to output the knearest neighbor graph. Index Terms—Nearest neighbor searching, point based graphics, knearest neighbor graphics, Morton Ordering, parallel algorithms. 1
Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
, 2008
"... Nearest neighbor graphs are widely used in data mining and machine learning. The bruteforce method to compute the exact kNN graph takes Θ(dn 2) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dn t) ti ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
Nearest neighbor graphs are widely used in data mining and machine learning. The bruteforce method to compute the exact kNN graph takes Θ(dn 2) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dn t) time for high dimensional data (large d). The exponent t depends on an internal parameter and is larger than one. Experiments show that a high quality graph usually requires a small t which is close to one. A few of the practical details of the algorithms are as follows. First, the divide step uses an inexpensive Lanczos procedure to perform recursive spectral bisection. After each conquer step, an additional refinement step is performed to improve the accuracy of the graph. Finally, a hash table is used to avoid repeating distance calculations during the divide and conquer process. The combination of these techniques is shown to yield quite effective algorithms for building kNN graphs.
Hit Miss Networks with Applications to Instance Selection
"... In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, influence the learning algorithm and affect generalization performance. This paper introduces a new networkbased representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation between each pair of classes. We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1NN) decision rule, such as being border or central point. This motivates us to use HMN’s for improving the performance of a 1NN classifier by removing instances from the training set (instance selection). We introduce three new algorithms based on HMN for instance selection. HMNC, which removes instances without affecting accuracy of 1NN on the training set, HMNE, which removes more instances than HMNC, and HMNEI, which applies iteratively HMNE. Their performance is assessed on 22 artificial and real life datasets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise containt, and presence of redundant variables. Results of experiments on these datasets show that accuracy of 1NN classifier increases significantly when HMNEI is applied. Comparison with stateoftheart editing algorithms for instance selection on these datasets indicates best generalization performance of HMNEI and no significant difference in storage requirements. In general, these results seem to show that HMN’s provide a powerful graphbased representation of a training set, which can be successfully applied for performing noise and redundance reduction in instancebased learning. Keywords: Graphbased training set representation, nearest neighbor, instance selection for instancebased learning. 1
1 Continuous All kNearest Neighbor Querying in Smartphone Networks
"... Abstract—Consider a centralized query operator that identifies to every smartphone user its k geographically nearest neighbors at all times, a query we coin Continuous All kNearest Neighbor (CAkNN). Such an operator could be utilized to enhance public emergency services, allowing users to send SOS ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
(Show Context)
Abstract—Consider a centralized query operator that identifies to every smartphone user its k geographically nearest neighbors at all times, a query we coin Continuous All kNearest Neighbor (CAkNN). Such an operator could be utilized to enhance public emergency services, allowing users to send SOS beacons out to the closest rescuers, allowing gamers and social networking users to establish adhoc overlay communication infrastructures, in order to carry out complex interactions. In this paper, we study the problem of efficiently processing a CAkNN query in a cellular or WiFi network, both of which are ubiquitous. We introduce an algorithm, coined Proximity, which answers CAkNN queries in O(n(k+λ)) time, where n denotes the number of users and λ a networkspecific parameter (λ << n). Proximity does not require any additional infrastructure or specialized hardware and its efficiency is mainly attributed to a smart search space sharing technique we introduce. Its implementation is based on a novel data structure, coined k +heap, which achieves constant O(1) lookup time and logarithmic O(log(k∗λ)) insertion/update time. Proximity, being parameterfree, performs efficiently in the face of high mobility and skewed distribution of users (e.g., the service works equally well in downtown, suburban, or rural areas). We have evaluated Proximity using mobility traces from two sources and concluded that our approach performs at least one order of magnitude faster than adapted existing work. I.
Parallel Construction of kNearest Neighbor Graphs for Point Clouds
, 2008
"... We present a parallel algorithm for knearest neighbor graph construction that uses Morton ordering. Experiments show that our approach has the following advantages over existing methods: (1) Faster construction of knearest neighbor graphs in practice on multicore machines. (2) Less space usage. ( ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We present a parallel algorithm for knearest neighbor graph construction that uses Morton ordering. Experiments show that our approach has the following advantages over existing methods: (1) Faster construction of knearest neighbor graphs in practice on multicore machines. (2) Less space usage. (3) Better cache efficiency. (4) Ability to handle large data sets. (5) Ease of parallelization and implementation.
Online Document Clustering Using the GPU
, 2010
"... Online document clustering takes as its input a list of document vectors, ordered by time. A document vector consists of a list of K terms and their associated weights. The generation of terms and their weights from the document text may vary, but the TFIDF (term frequencyinverse document frequenc ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
Online document clustering takes as its input a list of document vectors, ordered by time. A document vector consists of a list of K terms and their associated weights. The generation of terms and their weights from the document text may vary, but the TFIDF (term frequencyinverse document frequency) method is popular for clustering applications [1]. The assumption is that the resulting document vector is a good overall representation of the original document. We note that the dimensionality of the document vectors is very high (potentially infinite), since a document could potentially contain any word (term). We also note that the vectors are sparse in the sense that most term weights have a zero value. We assume that each term not explicitly present in a particular document vector has a weight of zero. Document vectors are normalized. Clusters are also represented as a list of weighted terms. At any given time, a cluster’s term vector is equal to the average of all the document vector’s contained by the cluster. Cluster term vectors are truncated to the top K terms (those containing the highest term weights). Cluster term vectors are kept normalized. The objective of the algorithm is to partition the set of document vectors into a set of clusters, each cluster containing only those documents which are similar to each other with respect to some metric. For this paper, we consider the Euclidean dot product as the similarity metric, as it has been shown to provide good results with the TFIDF metric [1]. The similarity between a cluster and a document is defined as the dot product between their term vectors. We first present serial a algorithm for online clustering. We then describe a PRAM algorithm for parallel online clustering, assuming a CRCW model. Finally, we present a practical implementation of an approximate parallel online clustering algorithm, suitable for the CUDA parallel computing architecture [2]. 1. Serial Clustering 1 The basic serial online clustering algorithm takes as input a list of n document vectors, as well as a clustering threshold T ranging between 0 and 1. Below is a high level overview of the algorithm.
Optimizing AllNearestNeighbor Queries with Trigonometric Pruning
"... Abstract. Many applications require to determine the knearest neighbors for multiple query points simultaneously. This task is known as all(k)nearestneighbor (AkNN) query. In this paper, we suggest a new method for efficient AkNN query processing which is based on spherical approximations for in ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Many applications require to determine the knearest neighbors for multiple query points simultaneously. This task is known as all(k)nearestneighbor (AkNN) query. In this paper, we suggest a new method for efficient AkNN query processing which is based on spherical approximations for indexing and query set representation. In this setting, we propose trigonometric pruning which enables a significant decrease of the remaining search space for a query. Employing this new pruning method, we considerably speed up AkNN queries. 1
Saliencyassisted navigation of very large landscape images
 Visualization and Computer Graphics, Transactions on
"... Abstract—The field of visualization has addressed navigation of very large datasets, usually meshes and volumes. Significantly less attention has been devoted to the issues surrounding navigation of very large images. In the last few years the explosive growth in the resolution of camera sensors and ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract—The field of visualization has addressed navigation of very large datasets, usually meshes and volumes. Significantly less attention has been devoted to the issues surrounding navigation of very large images. In the last few years the explosive growth in the resolution of camera sensors and robotic image acquisition techniques has widened the gap between the display and image resolutions to three orders of magnitude or more. This paper presents the first steps towards navigation of very large images, particularly landscape images, from an interactive visualization perspective. The grand challenge in navigation of very large images is identifying regions of potential interest. In this paper we outline a threestep approach. In the first step we use multiscale saliency to narrow down the potential areas of interest. In the second step we outline a method based on statistical signatures to further cull out regions of high conformity. In the final step we allow a user to interactively identify the exceptional regions of high interest that merit further attention. We show that our approach of progressive elicitation is fast and allows rapid identification of regions of interest. Unlike previous work in this area, our approach is scalable and computationally reasonable on very large images. We validate the results of our approach by comparing them to usertagged regions of interest on several very large landscape images from the Internet.
Georeferenced Point Clouds: A Survey of Features and Point Cloud Management
, 2013
"... Abstract: This paper presents a survey of georeferenced point clouds. Concentration is, on the one hand, put on features, which originate in the measurement process themselves, and features derived by processing the point cloud. On the other hand, approaches for the processing of georeferenced point ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract: This paper presents a survey of georeferenced point clouds. Concentration is, on the one hand, put on features, which originate in the measurement process themselves, and features derived by processing the point cloud. On the other hand, approaches for the processing of georeferenced point clouds are reviewed. This includes the data structures, but also spatial processing concepts. We suggest a categorization of features into levels that reflect the amount of processing. Point clouds are found across many disciplines, which is reflected in the versatility of the literature suggesting specific features.
Breaking the Fog: Defining and Orienting Surfaces in Complex Point Cloud Datasets
"... Abstract — We present a vertex clustering algorithm for the purposes of surface determination and normal estimation that can help provide detailed visualizations of complex point cloud datasets. The proposed method combines a novel bucket and layer spatial partitioning scheme, along with an iterativ ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract — We present a vertex clustering algorithm for the purposes of surface determination and normal estimation that can help provide detailed visualizations of complex point cloud datasets. The proposed method combines a novel bucket and layer spatial partitioning scheme, along with an iterative process for surface subdivision based on the optimization of qualityoffit statistics. Our approach can efficiently decompose and approximate a dataset through the local classification and fitting of surface regions. The algorithm uses a standard least squares approach combined with Delaunaybased triangulation for developing these approximated surfaces. To demonstrate the effectiveness of our approach, we execute the algorithm on several realworld datasets scanned from complex environments. We perform an analysis of the various techniques presented and provide a comparison of our approach with the standard knearest neighbors method commonly used for solving this problem. Through this performance analysis we show that as the complexity of the datasets increase, the performance and accuracy of our proposed approach continues to function at an effective level. Index Terms—Vertex clustering, surface determination, normal estimation, layered surfaces, uncertain environments. 1