• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Open information extraction from the web (2007)

Cached

  • Download as a PDF

Download Links

  • [turing.cs.washington.edu]
  • [www.eecs.umich.edu]
  • [ai.cs.washington.edu]
  • [www.cs.washington.edu]
  • [www.cs.washington.edu]
  • [www.cs.washington.edu]
  • [www.cs.washington.edu]
  • [homes.cs.washington.edu]
  • [web.eecs.umich.edu]
  • [www.ijcai.org]
  • [dli.iiit.ac.in]
  • [www.aaai.org]
  • [ijcai.org]
  • [www.ijcai.org]

  • Other Repositories/Bibliography

  • CiteULike
  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Michele Banko , Michael J Cafarella , Stephen Soderland , Matt Broadhead , Oren Etzioni
Venue:IN IJCAI
Citations:373 - 39 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Banko07openinformation,
    author = {Michele Banko and Michael J Cafarella and Stephen Soderland and Matt Broadhead and Oren Etzioni},
    title = {Open information extraction from the web},
    booktitle = {IN IJCAI},
    year = {2007},
    pages = {2670--2676},
    publisher = {}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER’s 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000 more abstract assertions.

Keyphrases

open information extraction    target relation    probability tuples    web page corpus    abstract assertion    relational tuples    information extraction    single data-driven pas    hand-tag new training example    new extraction paradigm    comparable set    scalable oie system    new extraction rule    efficient extraction    concrete fact    pre-specified request    state-of-the-art web ie system    new domain    open ie    human input    user query    manual labor scale    large set    small homogeneous corpus    pre-specified relation    error reduction   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University