Results 1 - 10
of
67
Aggregating inconsistent information: ranking and clustering
, 2005
"... We address optimization problems in which we are given contradictory pieces of input information and the goal is to find a globally consistent solution that minimizes the number of disagreements with the respective inputs. Specifically, the problems we address are rank aggregation, the feedback arc ..."
Abstract
-
Cited by 229 (17 self)
- Add to MetaCart
We address optimization problems in which we are given contradictory pieces of input information and the goal is to find a globally consistent solution that minimizes the number of disagreements with the respective inputs. Specifically, the problems we address are rank aggregation, the feedback arc set problem on tournaments, and correlation and consensus clustering. We show that for all these problems (and various weighted versions of them), we can obtain improved approximation factors using essentially the same remarkably simple algorithm. Additionally, we almost settle a long-standing conjecture of Bang-Jensen and Thomassen and show that unless NP⊆BPP, there is no polynomial time algorithm for the problem of minimum feedback arc set in tournaments.
Learning to Attach Semantic Metadata to Web Services
- In Proc. Int. Semantic Web Conf
, 2003
"... Emerging Web standards promise a network of heterogeneous yet interoperable Web Services. Web Services would greatly simplify the development of many kinds of data integration and knowledge management applications. Unfortunately, this vision requires that services describe themselves with large amou ..."
Abstract
-
Cited by 65 (10 self)
- Add to MetaCart
Emerging Web standards promise a network of heterogeneous yet interoperable Web Services. Web Services would greatly simplify the development of many kinds of data integration and knowledge management applications. Unfortunately, this vision requires that services describe themselves with large amounts of semantic metadata "glue". We explore a variety of machine learning techniques to semiautomatically create such metadata.
Measuring intrusion detection capability: An information-theoretic approach
- In Proceedings of ACM Symposium on InformAction, Computer and Communications Security (ASIACCS’06
, 2006
"... A fundamental problem in intrusion detection is what metric(s) can be used to objectively evaluate an intrusion detection system (IDS) in terms of its ability to correctly classify events as normal or intrusive. Traditional metrics (e.g., true positive rate and false positive rate) measure different ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
(Show Context)
A fundamental problem in intrusion detection is what metric(s) can be used to objectively evaluate an intrusion detection system (IDS) in terms of its ability to correctly classify events as normal or intrusive. Traditional metrics (e.g., true positive rate and false positive rate) measure different aspects, but no single metric seems sufficient to measure the capability of intrusion detection systems. The lack of a single unified metric makes it difficult to fine-tune and evaluate an IDS. In this paper, we provide an in-depth analysis of existing metrics. Specifically, we analyze a typical cost-based scheme [6], and demonstrate that this approach is very confusing and ineffective when the cost factor is not carefully selected. In addition, we provide a novel information-theoretic analysis of IDS and propose a new metric that highly complements
Aggregation of partial rankings, p-ratings and top-m lists
- ACM-SIAM Symposium on Discrete Algorithms (SODA
, 2007
"... We study the problem of aggregating partial rankings. This problem is motivated by applications such as meta-searching and information retrieval, search engine spam fighting, e-commerce, learning from experts, analysis of population preference sampling, committee decision making and more. We improve ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
(Show Context)
We study the problem of aggregating partial rankings. This problem is motivated by applications such as meta-searching and information retrieval, search engine spam fighting, e-commerce, learning from experts, analysis of population preference sampling, committee decision making and more. We improve recent constant factor approximation algorithms for aggregation of full rankings and generalize them to partial rankings. Our algorithms improved constant factor approximation with respect to all metrics discussed in Fagin et al’s recent important work on comparing partial rankings. We pay special attention to two important types of partial rankings: the well-known top-m lists and the more general p-ratings which we define. We provide first evidence for hardness of aggregating them for constant m, p.
Exploiting Agreement and Disagreement of Human Annotators for Word Sense Disambiguation
- In Proceedings of RANLP 2003
, 2003
"... It is generally agreed that the success of a Word Sense Disambiguation (WSD) system depends, in large, on having enough sense annotated data available at hand, and a well-motivated sense inventory into which the disambiguations are made. ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
It is generally agreed that the success of a Word Sense Disambiguation (WSD) system depends, in large, on having enough sense annotated data available at hand, and a well-motivated sense inventory into which the disambiguations are made.
M.: Cluster generation and cluster labelling for web snippets: A fast and accurate hierarchical solution
- In Proceedings of the 13th Symposium on String Processing and Information Retrieval (SPIRE 2006
, 2006
"... Abstract. This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Abstract. This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Striking the right balance between running time and cluster well-formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the fly by processing only the snippets provided by the auxiliary search engines, and use no external sources of knowledge. Clustering is performed by means of a fast version of the furthest-point-first algorithm for metric kcenter clustering. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Directory Project hierarchy. According to two widely accepted “external” metrics of clustering quality, Armil achieves better performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms. 1
Ontology learning from text: A look back and into the future
- ACM Comput. Surv
"... Ontologies are often viewed as the answer to the need for interoperable semantics in modern information systems. The explosion of textual information on the Read/Write Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from te ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Ontologies are often viewed as the answer to the need for interoperable semantics in modern information systems. The explosion of textual information on the Read/Write Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from text a very promising research area. This together with the advanced state in related areas, such as natural language processing, have fueled research into ontology learning over the past decade. This survey looks at how far we have come since the turn of the millennium and discusses the remaining challenges that will define the research directions in this area in the near future.
Learnable Similarity Functions and Their Applications to Clustering and Record Linkage
, 2004
"... rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these in ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
(Show Context)
rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these initial results are encouraging, there still remains a large number of similarity functions that are currently unable to adapt to a particular domain. In our research, we attempt to bridge this gap by developing both new learnable similarity functions and methods for their application to particular problems in machine learning and data mining. In preliminary work, we proposed two learnable similarity functions for strings that adapt distance computations given training pairs of equivalent and non-equivalent strings (Bilenko & Mooney 2003a). The first function is based on a probabilistic model of edit distance with affine gaps (Gus- Copyright c # 2004, American Association for Artificial Intelli
Cluster generation and labeling for web snippets: A fast, accurate hierarchical solution
- Journal of Internet Mathematics
, 2006
"... Abstract. This paper describes Armil, a meta-search engine that groups the web snippets returned by auxiliary search engines into disjoint labeled clusters. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to his/her information n ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. This paper describes Armil, a meta-search engine that groups the web snippets returned by auxiliary search engines into disjoint labeled clusters. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to his/her information need. Striking the right balance between running time and cluster well-formedness was a key point in the design of our system. Both the clustering and the labeling tasks are performed on the fly by processing only the snippets provided by the auxiliary search engines, and they use no external sources of knowledge. Clustering is performed by means of a fast version of the furthest-pointfirst algorithm for metric k-center clustering. Cluster labeling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Directory Project hierarchy. According to two widely accepted “external ” metrics of clustering quality, Armil achieves better performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labeling algorithms. On a standard desktop PC (AMD Athlon 1-Ghz Clock with 750 Mbytes RAM), Armil performs clustering and labeling altogether of up to 200 snippets in less than one second. 1.