26 citations found. Retrieving documents...
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model, and algorithms. In International Conference on Very Large Databases (VLDB), pages 371--380, Rome, Italy, 2001.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Query Answering in Inconsistent Databases - Bertossi, Chomicki   (Correct)

....The issues addressed consist of detecting and solving conflicts inside the database and conflicts between answers to questionnaires and the intended declarative semantics of the latter, as opposed to conflicts between data and integrity constraints. This work is a specific case of data cleaning [45]. It has been widely recognized that in database integration, the integrated data may be inconsistent with the integrity constraints. A typical (theoretical) solution to the problem of database inconsistency in this context is augmenting the data model to represent disjunctive information. ....

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C-A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In 27th International Conference on Very Large Data Bases, pp. 371--380, 2001.


Record Linkage: Current Practice and Future Directions - Gu, Baxter, Vickers..   (Correct)

....et al. 34] present an alternative record linkage system model called the Induction Record Linkage Model. This model shows how training data can be incorporated in the record linkage system, where it is available. Other alternative record linkage systems that have been proposed include AJAX [40], WHIRL [24] Intelliclean [66] Merge Purge [48] and SchemaSQL [63] Some of the designs and architectures for the many available academic, government and commercial systems are found in Appendix A. 3.2 Standardisation Methods Standardisation is also called data cleaning or attribute level ....

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative Data Cleaning: Language, Model and Algorithms. In Proc. of the 27th VLDB Conf., 2001.


Text Joins for Data Cleansing and Integration in an RDBMS - Gravano, Ipeirotis.. (2003)   (1 citation)  (Correct)

....pairs with estimated similarity exceeding threshold #. 3. Related Work The problem of approximate string matching has attracted interest in the algorithms and combinatorial pattern matching communities [8] and results from this area have been used for data integration and cleansing applications [4, 5]. The string edit distance [7] with its numerous variants) has been frequently used for approximate string matching. Gravano et al. 5] presented a method to integrate approximate string matching via edit distance into a database and realize it as SQL statements. The information retrieval field ....

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative data cleaning: Language, model, and algorithms. In VLDB 2001.


Text Joins in an RDBMS for Web Data Integration - Gravano, Ipeirotis, Koudas.. (2003)   (3 citations)  (Correct)

....is that records fit in memory and or that evaluation of the cross product of two files (and sometimes its materialization) is viable. This is not true with very large data collections. Approximate matching of strings is a problem of central interest for data integration and cleansing applications [9, 11]. The problem of approximate string matching has attracted interest in the algorithms and combinatorial pattern matching communities [19] and commonly the string edit distance (with its numerous variants) is adopted for approximate string match quantification. Gravano et al. 11] presented a ....

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative data cleaning: Language, model, and algorithms. In Proceedings of the 27th International Conference on Very Large Databases (VLDB 2001.


Interactive Deduplication using Active Learning - Sarawagi, Bhamidipaty (2002)   (16 citations)  (Correct)

....In the legend part of Figure 9 we show the precision and recall values after the last round. Along both these metrics separately active learning is close to the optimal approach. 5 Related work Recently, there has been renewed interest in the database community on the data cleaning problem [26, 9, 25] comprising several aspects, including, data segmentation, deduplication, outlier detection, standardization and schema mapping. For the specific problem of deduplication, most recent work [11, 22] has concentrated on the performance aspects assuming that the deduplication function is input by the ....

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model and algorithms. In Proc. of the 27th Int'l Conference on Very Large Databases (VLDB), pages 307--316, Rome,Italy, 2001.


Logics for Emerging Applications of Databases - Chomicki, Saake, van der Meyden (2003)   (Correct)

....The issues addressed consist of detecting and solving conflicts inside the database, and conflicts between answers to questionnaires and the intended, declarative semantics of the latter, as opposed to conflicts between data and integrity constraints. This work is a specific case of data cleaning [44]. It has been widely recognized that in database integration the integrated data may be inconsistent with the integrity constraints. A typical (theoretical) solution to the problem of database inconsistency in this context is to augment the data model to represent disjunctive information. ....

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C-A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In International Conference on Very Large Data Bases, pages 371--380, 2001.


Efficient Record Linkage in Large Data Sets - Liang Jin Chen (2003)   (2 citations)  (Correct)

....of different values. Next, field level similarity metrics need to be combined to determine overall similarity between two records. The second challenge is to provide user friendly interactive tools for users to specify different transformations, and use the feedback to improve the data quality [3, 5, 15]. The third challenge is that of scale. Often it is computationally prohibitive to apply a nested loop approach to use the similarity function(s) to compute the distance between each pair of records. This issue has previously been studied in [7] which proposed an approach that first merges two ....

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.- A. Saita. Declarative data cleaning: Language, model, and algorithms. In VLDB, pages 371--380, 2001.


Efficient Record Linkage in Large Data Sets - Jin, Li, Mehrotra (2003)   (2 citations)  (Correct)

....(EM) methods and support vector machines have been proposed in the literature [4, 7, 35] The second challenge in record linkage is to provide user friendly interactive tools for users to specify di#erent transformations, and use the feedback to improve the data quality. Recently a few works [11, 13, 30] have been conducted to solve this problem. The third challenge is that of scale. A simple solution is to use a nested loop approach to generate the Cartesian product of records, and then use the similarity function(s) to compute the distance between each pair of records. This approach is ....

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative data cleaning: Language, model, and algorithms. In VLDB, pages 371--380, 2001.


Declarative Data Merging With Conflict Resolution - Naumann, Häussler (2002)   (1 citation)  (Correct)

....same object and can be merged. In the absence of such an ID, object identification techniques can be employed [13] These techniques find duplicates automatically by evaluating the data that is available. Even though no method performs with perfect accuracy, satisfying results have been reported [7,14,5]. Ensuring correctness is, in effect, the problem of grouping tuples about the same real world object into a single tuple in the merged result. Our approach. In this paper we propose declarative merging of relational data by means of the SQL query language, so that merging can be performed by ....

....grouping) and it extends the SQL language to solve the problem of resolving conflicts among duplicates (user defined aggregation) Once these extensions are part of a DBMS and its optimizer, they harmonize with the findings here. Galhardas et al. present a framework for declarative data cleaning [5], implemented as the Ajax tool [1] The framework encompasses all steps of data cleaning including operators for object identification and data merging. The authors suggest a proprietary merge operator grouping tuples over ID attributes and applying functions to resolve conflicts. Details of the ....

Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon and Cristian Saita. Declarative data cleaning: language, model, and algorithms. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 371-380, Rome, Italy, 2001.


Mining Variants of Rules Using the CrystalBall Framework - Ong, Ng, Lim (2001)   (Correct)

....value in doing so diminishes quickly as other aspects of KDD are weakly addressed and the analyst s goal lies in a section of the problem space. We therefore agree with [16, 18] to call for a focus on other aspects of KDD to realize the goal of data mining. Some works has been observed recently [3, 9, 33] and CrystalBall s contribution in this aspect is to eliminate the need for engineering new algorithms. Given the engine and the variant specification language, new parameters can be considered and compositions developed. This in turn eliminates programming, frees the analyst from the restrictions ....

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and CA. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In Proc. of VLDB, Italy, 2001.


Hippocratic Databases - Agrawal, Kiernan, Srikant, Xu (2002)   (12 citations)  (Correct)

....while Bob s records would have three purposes: purchase, registration and recommendations. The set of purposes combined with the information in the privacy authorizations table will be used to restrict access. Data Preprocessing The Data Accuracy Analyzer may run some data cleansing functions [19] [41] against the data to check for accuracy either before or after data insertion, thus addressing the Principle of Accuracy. In our example, Alice s address may be checked against a database of street addresses to catch typos in the address or zip code. 4.2.3 Queries Queries are submitted to ....

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative data cleaning: Language, model, and algorithms. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pages 371--380, 2001.


Ecient Development of Data Migration Transformations - Paulo Carreira Oblog   Self-citation (Galhardas)   (Correct)

No context found.

Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon, and Cristian-Augustin Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB'01), Rome, Italy, September 2001.


Declarative Data Cleaning: Language, Model, and Algorithms - Galhardas, Florescu, Shasha (2001)   Self-citation (Galhardas Florescu Shasha Simon Saita)   (Correct)

....This section gives a presentation by example of the ve logical operators (mapping, view, matching, clustering, and merging) o ered by our declarative language for expressing data cleaning transformations. A formal description of our operators and the BNF grammar for their syntax can be found in [7]. In [7] we also show how to express data cleaning transformations existing in commercial systems or introduced in the research literature as a composition of our operators. 3.1 Mapping Operator A mapping operator takes a single relation as input and produces one or more relations. It expresses ....

....gives a presentation by example of the ve logical operators (mapping, view, matching, clustering, and merging) o ered by our declarative language for expressing data cleaning transformations. A formal description of our operators and the BNF grammar for their syntax can be found in [7] In [7], we also show how to express data cleaning transformations existing in commercial systems or introduced in the research literature as a composition of our operators. 3.1 Mapping Operator A mapping operator takes a single relation as input and produces one or more relations. It expresses ....

[Article contains additional citation context not shown here]

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.- A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. Extended version of the VLDB'01 paper, 2001.


Improving Data Cleaning Quality using a Data Lineage.. - Galhardas, Florescu.. (2001)   (7 citations)  Self-citation (Galhardas Florescu Shasha Simon Saita)   (Correct)

No context found.

Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon, and Cristian-Augustin Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In VLDB, Rome, Italy, September 2001.


A Duplicate Detection Benchmark for - Xml And Relational   (Correct)

No context found.

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model, and algorithms. In International Conference on Very Large Databases (VLDB), pages 371--380, Rome, Italy, 2001.


DogmatiX Tracks down Duplicates in XML - Melanie Weis Mweis (2005)   (Correct)

No context found.

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model, and algorithms. In International Conference on Very Large Databases, pages 371--380, Rome, Italy, 2001.


Enhancing Data Analysis with Noise Removal - Hui Xiong Member   (Correct)

No context found.

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model and algorithms. In Proceedings of the 2001.


Enhancing Data Analysis with Noise Removal - Hui Xiong Student   (Correct)

No context found.

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model and algorithms. In Proceedings of the 2001.


Using AutoMed for Data Warehousing - Hao Fan School   (Correct)

No context found.

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.A. Saita. Declarative data cleaning: Language, model, and algorithms. In VLDB 2001.


Data Quality in Genome Databases - Müller, Naumann, Freytag (2003)   (Correct)

No context found.

H. Galhardas, D. Florescu, D. Shasha, E. Simon, C.-A. Saita, Declarative data cleaning: Language, model, and algorithms, Proceedings of the 27


TAILOR: A Record Linkage Toolbox - Elfeky, Verykios, Elmagarmid (2002)   (11 citations)  (Correct)

No context found.

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In Proc. of the 27 VLDB Int. Conf. on Very Large Databases, Roma, Italy, September 2001.


TAILOR: A Record Linkage Toolbox - Elfeky, Verykios, Elmagarmid (2002)   (11 citations)  (Correct)

No context found.

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In Proc. of the 27 VLDB Int. Conf. on Very Large Databases, Roma, Italy, September 2001.


Semantic Data Cleansing in Genome Databases - Müller   (Correct)

No context found.

H. Galhardas, D. Florescu, D. Shasha, E. Simon, C.-A. Saita, Declarative data cleaning: Language, model, and algorithms, Proceedings Roma, Italy, 2001


Record Linkage: A Machine Learning Approach, A.. - Elfeky, Verykios, .. (2003)   (Correct)

No context found.

H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In Proc. of the 27 Int. Conf. on Very Large Databases, Roma, Italy, September 2001.


Eliminating Fuzzy Duplicates in Data Warehouses - Ananthakrishna, Chaudhuri, Ganti (2002)   (12 citations)  (Correct)

No context found.

Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon, and Cristian Saita. Declarative data cleaning: Language, model, and algorithms. In Proceedings of the 27th International Conference on Very Large Databases, pages 371-- 380, Roma, Italy, September 11-14 2001.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC