Results 1 - 10
of
27
Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction
"... Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to repre ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to represent a different meaning of the input query, thus taking into account the language ambiguity, i.e. polysemy, issue. However, Web clustering methods typically rely on some shallow notion of textual similarity of search result snippets. As a result, text snippets with no word in common tend to be clustered separately, even if they share the same meaning, whereas snippets with words in common may be grouped together even if they refer to different meanings of the input query. In this paper, we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). Key to our approach is to first acquire the senses (i.e., meanings) of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on datasets of ambiguous queries, show that our approach outperforms both Web clustering and search engines. 1.
Axiomatic construction of hierarchical clustering in asymmetric networks,” https://fling.seas.upenn.edu/∼ssegarra/wiki/ index.php?n=Research.Publications
, 2012
"... We present an axiomatic construction of hierarchical clustering in asymmetric networks where the dissimilarity from node a to node b is not necessarily equal to the dissimilarity from node b to node a. The theory is built on the axioms of value and transformation which encode desirable properties co ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We present an axiomatic construction of hierarchical clustering in asymmetric networks where the dissimilarity from node a to node b is not necessarily equal to the dissimilarity from node b to node a. The theory is built on the axioms of value and transformation which encode desirable properties common to any clustering method. Two hierarchical clustering methods that abide to these axioms are de-rived: reciprocal and nonreciprocal clustering. We further show that any clustering method that satisfies the axioms of value and trans-formation lies between reciprocal and nonreciprocal clustering in a well defined sense. We apply this theory to the formation of circles of trust in social networks. Index Terms — Clustering, asymmetric networks. 1.
Classifying clustering schemes
- Foundations of Computational Mathematics
"... iv ..."
(Show Context)
Model-based clustering for multivariate functional data
, 2012
"... This paper proposes the first model-based clustering algorithm for multivariate functional data. After introducing multivariate functional principal components analysis (MFPCA), a parametric mixture model, based on the assumption of normality of the principal components, is defined and estimated by ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper proposes the first model-based clustering algorithm for multivariate functional data. After introducing multivariate functional principal components analysis (MFPCA), a parametric mixture model, based on the assumption of normality of the principal components, is defined and estimated by an EM-like algorithm. The main advantage of the proposed model is its ability to take into account the dependence among curves. Results on simulated and real datasets show the efficiency of the proposed method.
A Comparative Study of Various Clustering Algorithms
- in Data Mining“, International Journal of Engineering Research and Applications (IJERA
, 2012
"... Abstract-Data clustering is a process of putting similar data into groups. A clustering algorithm partitions a data set into several groups such that the similarity within a group is larger than among groups.This paper reviews six types of clustering techniques- k-Means Clustering, Hierarchical Clus ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract-Data clustering is a process of putting similar data into groups. A clustering algorithm partitions a data set into several groups such that the similarity within a group is larger than among groups.This paper reviews six types of clustering techniques- k-Means Clustering, Hierarchical Clustering, DBScan clustering, Density Based Clustering, Optics, EM Algorithm. These clustering techniques are implemented and analysed using a clustering tool WEKA.Performance of the 6 techniques are presented and compared. Index Terms-Data clustering, K-Means Clustering,
Document Clustering Evaluation: Divergence from a Random Baseline
"... Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed usin ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation. 1
Maximum volume clustering: A new discriminative clustering approach
- Journal of Machine Learning Research
, 2013
"... Editor: Ulrike von Luxburg The large volume principle proposed by Vladimir Vapnik, which advocates that hypotheses lying in an equivalence class with a larger volume are more preferable, is a useful alternative to the large margin principle. In this paper, we introduce a new discriminative clusterin ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Editor: Ulrike von Luxburg The large volume principle proposed by Vladimir Vapnik, which advocates that hypotheses lying in an equivalence class with a larger volume are more preferable, is a useful alternative to the large margin principle. In this paper, we introduce a new discriminative clustering model based on the large volume principle called maximum volume clustering (MVC), and then propose two approximation schemes to solve this MVC model: A soft-label MVC method using sequential quadratic programming and a hard-label MVC method using semi-definite programming, respectively. The proposed MVC is theoretically advantageous for three reasons. The optimization involved in hardlabel MVC is convex, and under mild conditions, the optimization involved in soft-label MVC is akin to a convex one in terms of the resulting clusters. Secondly, the soft-label MVC method pos-
An Empirical Study of Cluster Evaluation Metrics using Flow Cytometry Data
"... A wide range of abstract characteristics of partitions have been proposed for cluster evaluation. We empirically evaluated the performance of these metrics for flow cytometry data and found that the set-matching metrics perform closest to human. 1 ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
A wide range of abstract characteristics of partitions have been proposed for cluster evaluation. We empirically evaluated the performance of these metrics for flow cytometry data and found that the set-matching metrics perform closest to human. 1
Supervised Clustering
"... Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a teacher. We give an improved generic algorithm to cluster any concept class in that model. Our algorithm is query-efficient in the sense that it involves only a small amount of interaction with the teacher. We also present and study two natural generalizations of the model. The model assumes that the teacher response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees a random subset of the points. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property. 1
Multi-task learning for improved discriminative training in SMT,”
- in Proceedings of the Eighth Workshop on Statistical Machine Translation,
, 2013
"... Abstract Multi-task learning has been shown to be effective in various applications, including discriminative SMT. We present an experimental evaluation of the question whether multi-task learning depends on a "natural" division of data into tasks that balance shared and individual knowle ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract Multi-task learning has been shown to be effective in various applications, including discriminative SMT. We present an experimental evaluation of the question whether multi-task learning depends on a "natural" division of data into tasks that balance shared and individual knowledge, or whether its inherent regularization makes multi-task learning a broadly applicable remedy against overfitting. To investigate this question, we compare "natural" tasks defined as sections of the International Patent Classification versus "random" tasks defined as random shards in the context of patent SMT. We find that both versions of multi-task learning improve equally well over independent and pooled baselines, and gain nearly 2 BLEU points over standard MERT tuning.