Results 1 -
3 of
3
WebSets: Unsupervised Information Extraction approach to Extract Sets of Entities from the Web [Extended Abstract] ∗
"... We propose an unsupervised information extraction system, which exploits the structured information in the form of HTML tables to build meaningful sets of entities belonging to certain categories. Due to redundancy on the Web, we believe that entities belonging to important categories will frequentl ..."
Abstract
- Add to MetaCart
We propose an unsupervised information extraction system, which exploits the structured information in the form of HTML tables to build meaningful sets of entities belonging to certain categories. Due to redundancy on the Web, we believe that entities belonging to important categories will frequently co-occur in table columns. We present a clustering algorithm to cluster such frequently occurring entities
An Adaptive Ontology-Based Approach to Identify Correlation between Publications
, 2011
"... In this paper, we propose an adaptive ontology-based approach for related paper identification, to meet most researchers’ practical needs. By searching ontology, we can return a diverse set of papers that are explicitly and implicitly related to an input paper. Moreover, our approach does not rely o ..."
Abstract
- Add to MetaCart
In this paper, we propose an adaptive ontology-based approach for related paper identification, to meet most researchers’ practical needs. By searching ontology, we can return a diverse set of papers that are explicitly and implicitly related to an input paper. Moreover, our approach does not rely on known ontology. Instead, we build and update ontology for a collection with any domain of interest. Being independent from known ontology, our approach is much more adaptive for different domains.
WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction
"... We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and conceptinstance pairs obtained with Hearst patterns. In contrast, our method rel ..."
Abstract
- Add to MetaCart
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and conceptinstance pairs obtained with Hearst patterns. In contrast, our method relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns. The method can be efficiently applied to a large corpus, and experimental results on several datasets show that our method can accurately extract large numbers of concept-instance pairs. 1.

