6 citations found. Retrieving documents...
W. Cohen. Whirl: A word-based information representation language. Intelligence, 118:163-196, 2000. Artificial

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Information Extraction from Structured Documents.. - Kosala, Blockeel, ..   (Correct)

....We classify an IE system as an automatically built system if the wrapper is built only once and can be used for new extraction tasks directly, or if wrappers can be built for each new task using unsupervised training only. Some examples of the IE systems in this category are as follows. WHIRL [11] is a soft logic system that incorporates a notion of textual similarity developed in the information retrieval community. WHIRL has been used to implement some heuristics that are useful for IE in [13] Hemnani and Bressan [25] proposed a tree alignment algorithm that are based on two ....

W. Cohen. Whirl: A word-based information representation language. Intelligence, 118:163-196, 2000. Artificial


An expressive and efficient language for XML information retrieval - Kushmerick (2001)   (1 citation)  (Correct)

....such as nd books and CDs with similar titles . In some of these languages keywords are used merely as boolean lters without support for true ranked retrieval; others permit similarity calculations only between a data value and a constant, and thus cannot express the above query. WHIRL [Coh98, Coh00] avoids both problems, but assumes relational data. We propose ELIXIR, an expressive and ecient language for XML information retrieval that extends XML QL with a textual similarity operator. This operator can be used for similarity joins, so ELIXIR is suciently expressive to handle the sample ....

....the full cross product of variable bindings when evaluating similarity joins. We have designed an ecient algorithm for answering ELIXIR queries that avoids unnecessarily computing such variable binding cross products; see Figure 3. Speci cally, the ELIXIR query processing algorithm invokes WHIRL [Coh98, Coh00] as a subroutine . As described in Section 2.2, WHIRL extends Datalog (e.g. UW97] with a textual similarity metric. Our query processor rewrites an ELIXIR query Q into a series of XML QL queries that generate intermediate relational data, and then invokes WHIRL to eciently evaluate Q s ....

[Article contains additional citation context not shown here]

W. Cohen. WHIRL: A word-based information representation language. J. Arti


Expressive and efficient ranked querying of XML data - Chinenyanga, Kushmerick (2001)   (5 citations)  (Correct)

....would generate the full cross product of the variable bindings, and then compute the similarity of every pair. To avoid this problem, the ELIXIR query processing algorithm rewrites a query Q into a series of XML QL queries that generate intermediate relational data, and then invokes WHIRL [2, 3] to eciently evaluate Q s similarity predicates on this intermediate data. The resulting WHIRL tuples are then translated into the XML structure speci ed by Q using a nal XML QL query. 2 Related work In this section, we discuss related work, in order to both situate ELIXIR relative to other ....

....can be nested, with an entire XML QL query embedded inside the CONSTRUCT C clause. However, the ELIXIR query language is an extension to the non nested subset of XML QL. WHIRL. ELIXIR uses XML QL to query XML data in its native form, producing intermediate relational data, and then invokes WHIRL [2, 3] to evaluate the similarity predicates. WHIRL eciently answers relational queries containing textual similarity predicates, including similarity joins. A WHIRL query is a conjunction of relations, relational predicates, and similarity predicates. For example, the following WHIRL query outputs ....

[Article contains additional citation context not shown here]

W. Cohen. WHIRL: A word-based information representation language. J. Arti cial Intelligence, 118(163-196), 2000.


Intelligent Internet Systems - Levy, Weld (2000)   (16 citations)  (Correct)

....objects which are named slightly differently at different sites (or even at the same site) For example, how does one determine that Dan Weld is the same individual as Daniel S. Weld While this question has been considered at length in the database literature, Cohen s paper in this issue [26] offers a promising new approach. 3.3. Evaluation Both the Google search engine [18] and Kleinberg s hub and authority model [65] use hypertext link structure to estimate the overall quality of a Web page, but we know of no work that attempts to automatically evaluate the accuracy, reliability ....

....traffic. This problem has been considered in several works [56,61,118,123] The paper by Ambite and Knoblock in this issue [5] presents an algorithm for query optimization that combines the reformulation and optimization phases using a transformational approach. The paper by Cohen in this issue [26] describes the WHIRL system that considers the problem of quickly obtaining the first few answers to the query. WHIRL focuses on the important case where matching object names between different sources may require fuzzy matches, rather than exact matches. The BIG system, described in this issue s ....

W.W. Cohen, WHIRL: A word-based information representation language, Artificial Intelligence 118 (2000) 163--196 (this issue).


Intelligent Internet Systems - Levy, Weld (2000)   (16 citations)  (Correct)

....match objects which are named slightly di erently at di erent sites (or even at the same site) For example, how does one determine that Dan Weld is the same individual as Daniel S. Weld While this question has been considered at length in the database literature, Cohen s paper in this issue [26] o ers a promising new approach. 3.3 Evaluation Both the Google search engine [18] and Kleinberg s hub and authority model [66] use hypertext link structure to estimate the overall quality of a web page, but we know of no work that attempts to automatically evaluate the accuracy, reliability or ....

....trac. This problem has been considered in several works [57, 124, 119, 62] The paper by Ambite and Knoblock in this issue [4] presents an algorithm for query optimization that combines the reformulation and optimization phases using a transformational approach. The paper by Cohen in this issue [26] describes the WHIRL system that considers the problem of quickly obtaining the rst few answers to the query. WHIRL focuses on the important case where matching object names between di erent sources may require fuzzy matches, rather than exact matches. The BIG system, described in this issue s ....

W.W. Cohen. Whirl: A word-based information representation language. Articial Intelligence, this issue, 2000.


Learning to Match and Cluster Large High-Dimensional Data.. - Cohen, Richman (2002)   (15 citations)  Self-citation (Cohen)   (Correct)

....to join information across of pair of relations from di#erent databases, and clustering is important in removing duplicates from a relation that has been drawn from the union of many di#erent information sources. Previous work in this area includes work in distance functions for matching [14, 3, 9, 8] and scalable matching [2] and clustering [13] algorithms. Work in record linkage [15, 10, 21, 20, 7] is similar but does not rely as heavily on textual similarities. In this paper we synthesize many of these ideas. We present techniques for entity name matching and clustering that are scalable ....

....PossibleCenters = A; in Step 3b, let Canopy(a) a, b) b B and approxDist(a, b) T loose ; and . in Step 3d, let T tight = 0 (i.e. only remove a from the set of PossibleCenters) A functionally equivalent but somewhat more e#cient approach would be to use a soft join algorithm [3]. Learning a pairing function and construction of the graph G is identical. The greedy agglomerative clustering step, SubstringMatch true i# one of the two strings is a substring of the other. PrefixMatch true i# one of the strings is a prefix of the other. EditDistance(k) for k # 0.5, 1, 2, ....

William W. Cohen. WHIRL: A word-based information representation language. Artificial Intelligence, 118:163--196, 2000.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC