Results 1 - 10
of
132
Collective knowledge systems: Where the social web meets the semantic web
- Web Semantics: Science, Services and Agents on the World Wide Web
, 2008
"... Abstract: What can happen if we combine the best ideas from the Social Web and Semantic Web? The Social Web is an ecosystem of participation, where value is created by the aggregation of many individual user contributions. The Semantic Web is an ecosystem of data, where value is created by the integ ..."
Abstract
-
Cited by 111 (0 self)
- Add to MetaCart
Abstract: What can happen if we combine the best ideas from the Social Web and Semantic Web? The Social Web is an ecosystem of participation, where value is created by the aggregation of many individual user contributions. The Semantic Web is an ecosystem of data, where value is created by the integration of structured data from many sources. What applications can best synthesize the strengths of these two approaches, to create a new level of value that is both rich with human participation and powered by well-structured information? This paper proposes a class of applications called collective knowledge systems, which unlock the "collective intelligence " of the Social Web with knowledge representation and reasoning techniques of the Semantic Web.
Automatically Refining the Wikipedia Infobox Ontology
, 2008
"... The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more se ..."
Abstract
-
Cited by 102 (7 self)
- Add to MetaCart
(Show Context)
The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more semantic knowledge from natural language text. But in order to realize the full power of this information, it must be situated in a cleanly-structured ontology. This paper introduces KOG, an autonomous system for refining Wikipedia’s infobox-class ontology towards this end. We cast the problem of ontology refinement as a machine learning problem and solve it using both SVMs and a more powerful joint-inference approach expressed in Markov Logic Networks. We present experiments demonstrating the superiority of the joint-inference approach and evaluating other aspects of our system. Using these techniques, we build a rich ontology, integrating Wikipedia’s infobox-class schemata with WordNet. We demonstrate how the resulting ontology may be used to enhance Wikipedia with improved query processing and other features.
BOpen information extraction using Wikipedia
- in Proc. 48th Annu. Meeting Assoc. Comput. Linguist., 2010
"... Information-extraction (IE) systems seek to distill semantic relations from naturallanguage text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the ..."
Abstract
-
Cited by 96 (3 self)
- Add to MetaCart
(Show Context)
Information-extraction (IE) systems seek to distill semantic relations from naturallanguage text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE system which improves dramatically on TextRunner’s precision and recall. The key to WOE’s performance is a novel form of self-supervised learning for open extractors — using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE’s extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE can operate in two modes: when restricted to POS tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher. 1
Annotating and Searching Web Tables Using Entities, Types and Relationships
"... Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational world knowledge is usually considerably better than completely unstructured, free-format text. At the same time, unlike ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
(Show Context)
Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational world knowledge is usually considerably better than completely unstructured, free-format text. At the same time, unlike manually-created knowledge bases, relational information mined from “organic ” Web tables need not be constrained by availability of precious editorial time. Unfortunately, in the absence of any formal, uniform schema imposed on Web tables, Web search cannot take advantage of these high-quality sources of relational information. In this paper we propose new machine learning techniques to annotate table cells with entities that they likely mention, table columns with types from which entities are drawn for cells in the column, and relations that pairs of table columns seek to express. We propose a new graphical model for making all these labeling decisions for each table simultaneously, rather than make separate local decisions for entities, types and relations. Experiments using the YAGO catalog, DB-Pedia, tables from Wikipedia, and over 25 million HTML tables from a 500 million page Web crawl uniformly show the superiority of our approach. We also evaluate the impact of better annotations on a prototype relational Web search tool. We demonstrate clear benefits of our annotations beyond indexing tables in a purely textual manner. 1.
Information extraction from Wikipedia: Moving down the long tail
- Proceedings of KDD08
, 2008
"... Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall ..."
Abstract
-
Cited by 50 (9 self)
- Add to MetaCart
(Show Context)
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper presents three novel techniques for increasing recall from Wikipedia’s long tail of sparse classes: (1) shrinkage over an automatically-learned subsumption taxonomy, (2) a retraining technique for improving the training data, and (3) supplementing results by extracting from the broader Web. Our experiments compare design variations and show that, used in concert, these techniques increase recall by a factor of 1.76 to 8.71 while maintaining or increasing precision.
Temporal information extraction
- In Proceedings of the Twenty Fifth National Conference on Artificial Intelligence
, 2010
"... Research on information extraction (IE) seeks to distill re-lational tuples from natural language text, such as the con-tents of the WWW. Most IE work has focussed on identi-fying static facts, encoding them as binary relations. This is unfortunate, because the vast majority of facts are fluents, on ..."
Abstract
-
Cited by 50 (1 self)
- Add to MetaCart
Research on information extraction (IE) seeks to distill re-lational tuples from natural language text, such as the con-tents of the WWW. Most IE work has focussed on identi-fying static facts, encoding them as binary relations. This is unfortunate, because the vast majority of facts are fluents, only holding true during an interval of time. It is less helpful to extract PresidentOf(Bill-Clinton, USA) with-out the temporal scope 1/20/93- 1/20/01. This paper presents TIE, a novel, information-extraction system, which distills facts from text while inducing as much temporal in-formation as possible. In addition to recognizing temporal relations between times and events, TIE performs global in-ference, enforcing transitivity to bound the start and ending times for each event. We introduce the notion of temporal en-tropy as a way to evaluate the performance of temporal IE sys-tems and present experiments showing that TIE outperforms three alternative approaches.
Discovering Unknown Connections - the DBpedia Relationship Finder
- In Proceedings of the 1st Conference on Social Semantic Web (CSSW 2007), volume 113 of LNI
, 2007
"... Abstract: The Relationship Finder is a tool for exploring connections between objects in a Semantic Web knowledge base. It offers a new way to get insights about ele-ments in an ontology, in particular for large amounts of instance data. For this reason, we applied the idea to the DBpedia data set, ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
(Show Context)
Abstract: The Relationship Finder is a tool for exploring connections between objects in a Semantic Web knowledge base. It offers a new way to get insights about ele-ments in an ontology, in particular for large amounts of instance data. For this reason, we applied the idea to the DBpedia data set, which contains an enormous amount of knowledge extracted from Wikipedia. We describe the workings of the Relationship Finder algorithm and present some interesting statistical discoveries about DBpedia and Wikipedia. 1
Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language
- In SIGMOD ’10: Proceedings of International Conference on Management of Data
, 2010
"... Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands or millions. Formulating information needs with conventional structured query languages is difficult due to the sheer si ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
(Show Context)
Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands or millions. Formulating information needs with conventional structured query languages is difficult due to the sheer size of schema information available to the user. We address this challenge by proposing a new query language that blends keyword search with structured query processing over large information graphs with rich semantics.
Using Wikipedia to Bootstrap Open Information Extraction
"... We often use ‘Data Management ’ to refer to the manipulation of relational or semi-structured information, but much of the world’s data is unstructured, for example the vast amount of natural-language text on the Web. The ability to manage ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
(Show Context)
We often use ‘Data Management ’ to refer to the manipulation of relational or semi-structured information, but much of the world’s data is unstructured, for example the vast amount of natural-language text on the Web. The ability to manage
Semplore: An IR Approach to Scalable Hybrid Query of Semantic Web Data
"... Abstract. As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
(Show Context)
Abstract. As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries with imprecise keyword searches to have a hybrid query capability. In addition, due to the huge volume of information on the Semantic Web, the hybrid query must be processed in a very scalable way. In this paper, we define such a hybrid query capability that combines unary tree-shaped structured queries with keyword searches. We show how existing information retrieval (IR) index structures and functions can be reused to index semantic web data and its textual information, and how the hybrid query is evaluated on the index structure using IR engines in an efficient and scalable manner. We implemented this IR approach in an engine called Semplore. Comprehensive experiments on its performance show that it is a promising approach. It leads us to believe that it may be possible to evolve current web search engines to query and search the Semantic Web. Finally, we breifly describe how Semplore is used for searching Wikipedia and an IBM customer’s product information. 1