Results 1  10
of
26
USING MUTUAL PROXIMITY TO IMPROVE CONTENTBASED AUDIO SIMILARITY
"... This work introduces Mutual Proximity, an unsupervised method which transforms arbitrary distances to similarities computed from the shared neighborhood of two data points. This reinterpretation aims to correct inconsistencies in the original distance space, like the hub phenomenon. Hubs are objects ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
This work introduces Mutual Proximity, an unsupervised method which transforms arbitrary distances to similarities computed from the shared neighborhood of two data points. This reinterpretation aims to correct inconsistencies in the original distance space, like the hub phenomenon. Hubs are objects which appear unwontedly often as nearest neighbors in predominantly highdimensional spaces. We apply Mutual Proximity to a widely used and standard contentbased audio similarity algorithm. The algorithm is known to be negatively affected by the high number of hubs it produces. We show that without a modification of the audio similarity features or inclusion of additional knowledge about the datasets, applying Mutual Proximity leads to a significant increase of retrieval quality: (1) hubs decrease and (2) the knearestneighbor classification rates increase significantly. The results of this paper show that taking the mutual neighborhood of objects into account is an important aspect which should be considered for this class of contentbased audio similarity algorithms. 1.
M.: A probabilistic approach to nearest neighbor classification: Naive hubness bayesian knearest neighbor
 In: Proceeding of the CIKM conference. (2011
"... Most machinelearning tasks, including classification, involve dealing with highdimensional data. It was recently shown that the phenomenon of hubness, inherent to highdimensional data, can be exploited to improve methods based on nearest neighbors (NNs). Hubness refers to the emergence of points ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Most machinelearning tasks, including classification, involve dealing with highdimensional data. It was recently shown that the phenomenon of hubness, inherent to highdimensional data, can be exploited to improve methods based on nearest neighbors (NNs). Hubness refers to the emergence of points (hubs) that appear among the k NNs of many other points in the data, and constitute influential points for kNN classification. In this paper, we present a new probabilistic approach to kNN classification, naive hubness Bayesian knearest neighbor (NHBNN), which employs hubness for computing class likelihood estimates. Experiments show that NHBNN compares favorably to different variants of the kNN classifier, including probabilistic kNN (PNN) which is often used as an underlying probabilistic framework for NN classification, signifying that NHBNN is a promising alternative framework for developing probabilistic NN algorithms.
LOCALIZED DISCRETE EMPIRICAL INTERPOLATION METHOD
"... Abstract. This paper presents a new approach to construct more efficient reducedorder models for nonlinear partial differential equations with proper orthogonal decomposition (POD) and the discrete empirical interpolation method (DEIM). Whereas DEIM projects the nonlinear term onto one global subsp ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents a new approach to construct more efficient reducedorder models for nonlinear partial differential equations with proper orthogonal decomposition (POD) and the discrete empirical interpolation method (DEIM). Whereas DEIM projects the nonlinear term onto one global subspace, our localized discrete empirical interpolation method (LDEIM) computes several local subspaces, each tailored to a particular region of characteristic system behavior. Then, depending on the current state of the system, LDEIM selects an appropriate local subspace for the approximation of the nonlinear term. In this way, the dimensions of the local DEIM subspaces, and thus the computational costs, remain low even though the system might exhibit a wide range of behaviors as it passes through different regimes. LDEIM uses machine learning methods in the offline computational phase to discover these regions via clustering. Local DEIM approximations are then computed for each cluster. In the online computational phase, machinelearningbased classification procedures select one of these local subspaces adaptively as the computation proceeds. The classification can be achieved using either the system parameters or a lowdimensional representation of the current state of the system obtained via feature extraction. The LDEIM approach is demonstrated for a reacting flow example of an H2Air flame. In this example, where the system state has a strong nonlinear dependence on the parameters, the LDEIM provides speedups of two orders of magnitude over standard DEIM.
K.: Hubnessbased indicators for semisupervised timeseries classification
 In: Proc. 8th JapaneseHungarian Symposium on Discrete Mathematics and Its Applications
, 2013
"... classification ..."
(Show Context)
SUCCESS: A new approach for semisupervised classification of timeseries
"... Abstract. The growing interest in timeseries classification can be attributed to the intensively increasing amount of temporal data collected by widespread sensors. Often, human experts may only review a small portion of all the available data. Therefore, the available labeled data may not be repr ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The growing interest in timeseries classification can be attributed to the intensively increasing amount of temporal data collected by widespread sensors. Often, human experts may only review a small portion of all the available data. Therefore, the available labeled data may not be representative enough and semisupervised techniques may be necessary. In order to construct accurate classifiers, semisupervised techniques learn both from labeled and unlabeled data. In this paper, we introduce a novel semisupervised timeseries classifier based on constrained hierarchical clustering and dynamic time warping. We discuss our approach in the framework of graph theory and evaluate it on 44 publicly available realworld timeseries datasets from various domains. Our results show that our approach substantially outperforms the stateoftheart semisupervised timeseries classifier. The results are also justified by statistical significance tests.
Reverse Nearest Neighbors in Unsupervised DistanceBased Outlier Detection
"... Abstract—Outlier detection in highdimensional data presents various challenges resulting from the “curse of dimensionality.” A prevailing view is that distance concentration, i.e., the tendency of distances in highdimensional data to become indiscernible, hinders the detection of outliers by makin ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Outlier detection in highdimensional data presents various challenges resulting from the “curse of dimensionality.” A prevailing view is that distance concentration, i.e., the tendency of distances in highdimensional data to become indiscernible, hinders the detection of outliers by making distancebased methods label all points as almost equally good outliers. In this paper we provide evidence supporting the opinion that such a view is too simple, by demonstrating that distancebased methods can produce more contrasting outlier scores in highdimensional settings. Furthermore, we show that high dimensionality can have a different impact, by reexamining the notion of reverse nearest neighbors in the unsupervised outlierdetection context. Namely, it was recently observed that the distribution of points ’ reverseneighbor counts becomes skewed in high dimensions, resulting in the phenomenon known as hubness. We provide insight into how some points (antihubs) appear very infrequently in kNN lists of other points, and explain the connection between antihubs, outliers, and existing unsupervised outlierdetection methods. By evaluating the classic kNN method, the anglebased technique (ABOD) designed for highdimensional data, the densitybased local outlier factor (LOF) and influenced outlierness (INFLO) methods, and antihubbased methods on various synthetic and realworld data sets, we offer novel insight into the usefulness of reverse neighbor counts in unsupervised outlier detection. Index Terms—Outlier detection, reverse nearest neighbors, highdimensional data, distance concentration 1
THE RELATION OF HUBS TO THE DODDINGTON ZOO IN SPEAKER VERIFICATION
"... In speaker verification systems there exists the wellknown phenomenon of speakers which are very problematic to verify and have been given various metaphoric animal names. Our work connects this socalled ‘Doddington zoo ’ and the animals of the whole ‘biometric menagerie ’ to the problem of ‘hubs ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In speaker verification systems there exists the wellknown phenomenon of speakers which are very problematic to verify and have been given various metaphoric animal names. Our work connects this socalled ‘Doddington zoo ’ and the animals of the whole ‘biometric menagerie ’ to the problem of ‘hubs ’ in high dimensional data spaces, which was recently the topic of a number of publications in the machine learning literature. Due to a general problem of measuring distances in high dimensional data spaces, hub objects emerge which have a high similarity to a large number of data items. This is a novel aspect of the ‘curse of dimensionality ’ which adversely affects classification and identification performance. In a series of experiments we try to understand the ‘Doddington zoo’ problem with respect to the notions of hubs and antihubs. Index Terms — Speaker verification, normalization, hubs 1.
recognition: the case of NISTSRE
, 2015
"... Uncertainty propagation for noise robust speaker ..."
(Show Context)
CLUSTERING WITH SHARED NEAREST NEIGHBORUNSCENTED TRANSFORM BASED ESTIMATION
"... ABSTRACT Subspace clustering developed from the group of cluster objects in all subspaces of a dataset. When clustering high dimensional objects, the accuracy and efficiency of traditional clustering algorithms are very poor, because data objects may belong to diverse clusters in different subspace ..."
Abstract
 Add to MetaCart
ABSTRACT Subspace clustering developed from the group of cluster objects in all subspaces of a dataset. When clustering high dimensional objects, the accuracy and efficiency of traditional clustering algorithms are very poor, because data objects may belong to diverse clusters in different subspaces comprised of different combinations of dimensions. To overcome the above issue, we are going to implement a new technique termed Opportunistic Subspace and Estimated Clustering (OSEC) model on high Dimensional Data to improve the accuracy in the search retrieval.Still to improve the quality of clustering hubness is a mechanism related to vectorspace data deliberated by the propensity of certain data points also referred to as the hubs with a miniature distance to numerous added data points in high dimensional spaces which is associated to the phenomenon of distance concentration. The performance of hubness on high dimensional data has an incapable impact on many machine learning tasks namely classification, nearest neighbor, outlier detection and clustering. Hubness is a newly unexplored problem of machine learning in high dimensional data spaces, which fails in automatically determining the number of clusters in the data. Subspace clustering discovers the efficient cluster validation but problem of hubness is not discussed effectively. To overcome clustering based hubness problem with sub spacing, high dimensionality of data employs the nearest neighbor machine learning methods. Shared Nearest Neighbor Clustering based on Unscented Transform (SNNCUT) estimation method is developed to overcome the hubness problem with determination of cluster data. The core objective of SNNC is to find the number of cluster points such that the points within a cluster are more similar to each other than to other points in a different cluster. SNNCUT estimates the relative density, i.e., probability density, in a nearest region and obtains a more robust definition of density. SNNCUT handle overlapping situations based on the unscented transform and calculate the statistical distance of a random variable which undergoes a nonlinear transformation. The experimental performance of SNNCUT and knearest neighbor hubness in clustering is evaluated in terms of clustering quality, distance measurement ratio, clustering time, and energy consumption.
The Role of Hubs in Crosslingual Supervised Document Retrieval
"... Abstract. Information retrieval in multilingual document repositories is of high importance in modern text mining applications. Analyzing textual data is, however, not without associated difficulties. Regardless of the particular choice of feature representation, textual data is highdimensional i ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Information retrieval in multilingual document repositories is of high importance in modern text mining applications. Analyzing textual data is, however, not without associated difficulties. Regardless of the particular choice of feature representation, textual data is highdimensional in its nature and all inference is bound to be somewhat affected by the well known curse of dimensionality. In this paper, we have focused on one particular aspect of the dimensionality curse, known as hubness. Hubs emerge as influential points in the knearest neighbor (kNN) topology of the data. They have been shown to affect the similarity based methods in severely negative ways in highdimensional data, interfering with both retrieval and classification. The issue of hubness in textual data has already been briefly addressed, but not in the context that we are presenting here, namely the multilingual retrieval setting. Our goal was to gain some insights into the crosslingual hub structure and exploit it for improving the retrieval and classification performance. Our initial analysis has allowed us to devise a hubnessaware instance weighting scheme for canonical correlation analysis procedure which is used to construct the common semantic space that allows the crosslingual document retrieval and classification. The experimental evaluation indicates that the proposed approach outperforms the baseline. This shows that the hubs can indeed be exploited for improving the robustness of textual feature representations.