8 citations found. Retrieving documents...
H. Davulcu, G. Yang, M. Kifer, and I. Ramakrishnan. Computational aspects of resilient data extraction from semistructured sources. In ACM Symposium on Principles of Database Systems (PODS), pages 136--144, 2000.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Information Extraction from Tree Documents by Learning Subtree.. - Chidlovskii (2003)   (1 citation)  (Correct)

....automata, it determine the automata structure by generalization from the training examples; in the case of weighted automata HMM, it learns the transition probabilities. To accommodate the information extraction, these methods either enhance finite state automata with extraction rules [4] or adopt the formalism of finite state transducers [2; 7] The global view approach benefits from the grammatical inference methods that can learn finite state automata and transducers from positive examples; however they often require many annotated samples to achieve a reasonable ....

H. Davulcu, G. Yang, M. Kifer, and I.V. Ramakrishnan. Computational aspects of resilient data extraction from semistructured sources. In ACM Symp. on Principles of Database Systems, pages 136--144, 2000.


Quixote: Building XML Repositories from Topic Specific Web.. - Chung, Gertz (2001)   (Correct)

....wrappers are hand crafted wrappers, e.g. HGMC 97, SA99, CDSS98] which require users to specify how to extract information. Such wrapper techniques take too much human effort and are very sensitive to changes in the format of documents. The second generation wrappers, e.g. Ade98, AK97, DYKR00] learn extraction rules from examples given by users, but still require that the documents follow the same format or have only slight variations. Also these approaches are inapplicable for documents from a large number of diverse data sources which typically employ different document formats. ....

H. Davulcu, G. Yang, M. Kifer, and I. Ramakrishnan. Computational aspects of resilient data extraction from semistructured sources. In 19th ACM Symposium on Principles of Database Systems, 136--144, 2000.


Semantic Bookmarking for Non-Visual Web Access - Saikat Mukherjee Stony   Self-citation (Kifer Ramakrishnan)   (Correct)

No context found.

H. Davulcu, G. Yang, M. Kifer, and I. Ramakrishnan. Computational aspects of resilient data extraction from semistructured sources. In ACM Symposium on Principles of Database Systems (PODS), pages 136--144, 2000.


Semantic Bookmarking for Non-Visual Web Access - Mukherjee, Ramakrishnan, Kifer (2004)   Self-citation (Kifer Ramakrishnan)   (Correct)

No context found.

H. Davulcu, G. Yang, M. Kifer, and I. Ramakrishnan. Computational aspects of resilient data extraction from semistructured sources. In ACM Symposium on Principles of Database Systems (PODS), pages 136--144, 2000.


On Precision and Recall of Multi-Attribute Data.. - Yang, Mukherjee.. (2003)   Self-citation (Yang Ramakrishnan)   (Correct)

No context found.

H. Davulcu, G. Yang, M. Kifer, and I. Ramakrishnan. Computational aspects of resilient data extraction from semistructured sources. In ACM Symposium on Principles of Database Systems (PODS), pages 136-144, 2000.


Design and Implementation of the Physical Layer in.. - Davulcu, Yang.. (2000)   Self-citation (Davulcu Yang Kifer Ramakrishnan)   (Correct)

....against the possibility of over generalizing the initial OLL expressions. Resilience means that the expression will be able to locate the requisite object under a large class of variations in the page layout. Some of the techniques used to create unambiguous resilient expressions are detailed in [7]. To illustrate the idea, consider the second visible table in Figure 1 (below the Component Detail header) This table is actually part of a bigger, invisible table, so the initial OLL expression would be generated as follows: table; table:tr; tr:td; td:img; table;table;td; table; img; ....

....it is too brittle. It will get us the desired table for a particular instance of the page, but a page generated for a di erent catalog search request might look slightly di erent and the above address might then point to a wrong item (or not point anywhere at all) This problem was addressed in [7]. Combining the techniques described in that work with other heuristics, we can create a much more resilient OLL expression: This expression says that in order to nd the desired table, we must nd a text object that matches the word Component at any level of nesting and then scan ....

H. Davulcu, G. Yang, M. Kifer, and I.V. Ramakrishnan. Computational aspects of resilient data extraction from semistructured sources. In ACM Symposium on Principles of Database Systems, May 2000.


On the Power of Semantic Partitioning of Web Documents - Yang, Mukherjee, Tan.. (2003)   Self-citation (Davulcu Yang Ramakrishnan)   (Correct)

....learned with the table structure will fail to locate the news items. This raises the important question of whether it is possible to build resilient wrappers that can automatically adapt to structural changes in Web pages. In a previous work we had attacked this problem from a syntactic viewpoint [Davulcu et al. 2000] . A truly robust solution, however, will require semantic knowledge of the content in a Web page. The semantic partitioning ideas described in this paper can form the foundation for such a solution. To understand the idea at a high level, let us examine how a selfrepairing wrapper based on ....

H. Davulcu, G. Yang, M. Kifer, and I.V. Ramakrishnan. Computational aspects of resilient data extraction from semistructured sources. In ACM PODS, May 2000.


Reverse Engineering for Web Data: From Visual to Semantic.. - Chung, Gertz (2002)   (9 citations)  (Correct)

No context found.

H. Davulcu, G. Yang, M. Kifer, and I. Ramakrishnan. Computational Aspects of Resilient Data Extraction from Semistructured Sources. In 19TH ACM Symp. on Principles of Database Systems, 136--144, 2000.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC