| H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model, and algorithms. In International Conference on Very Large Databases (VLDB), pages 371--380, Rome, Italy, 2001. |
....The issues addressed consist of detecting and solving conflicts inside the database and conflicts between answers to questionnaires and the intended declarative semantics of the latter, as opposed to conflicts between data and integrity constraints. This work is a specific case of data cleaning [45]. It has been widely recognized that in database integration, the integrated data may be inconsistent with the integrity constraints. A typical (theoretical) solution to the problem of database inconsistency in this context is augmenting the data model to represent disjunctive information. ....
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C-A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In 27th International Conference on Very Large Data Bases, pp. 371--380, 2001.
....et al. 34] present an alternative record linkage system model called the Induction Record Linkage Model. This model shows how training data can be incorporated in the record linkage system, where it is available. Other alternative record linkage systems that have been proposed include AJAX [40], WHIRL [24] Intelliclean [66] Merge Purge [48] and SchemaSQL [63] Some of the designs and architectures for the many available academic, government and commercial systems are found in Appendix A. 3.2 Standardisation Methods Standardisation is also called data cleaning or attribute level ....
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative Data Cleaning: Language, Model and Algorithms. In Proc. of the 27th VLDB Conf., 2001.
....pairs with estimated similarity exceeding threshold #. 3. Related Work The problem of approximate string matching has attracted interest in the algorithms and combinatorial pattern matching communities [8] and results from this area have been used for data integration and cleansing applications [4, 5]. The string edit distance [7] with its numerous variants) has been frequently used for approximate string matching. Gravano et al. 5] presented a method to integrate approximate string matching via edit distance into a database and realize it as SQL statements. The information retrieval field ....
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative data cleaning: Language, model, and algorithms. In VLDB 2001.
....is that records fit in memory and or that evaluation of the cross product of two files (and sometimes its materialization) is viable. This is not true with very large data collections. Approximate matching of strings is a problem of central interest for data integration and cleansing applications [9, 11]. The problem of approximate string matching has attracted interest in the algorithms and combinatorial pattern matching communities [19] and commonly the string edit distance (with its numerous variants) is adopted for approximate string match quantification. Gravano et al. 11] presented a ....
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative data cleaning: Language, model, and algorithms. In Proceedings of the 27th International Conference on Very Large Databases (VLDB 2001.
....In the legend part of Figure 9 we show the precision and recall values after the last round. Along both these metrics separately active learning is close to the optimal approach. 5 Related work Recently, there has been renewed interest in the database community on the data cleaning problem [26, 9, 25] comprising several aspects, including, data segmentation, deduplication, outlier detection, standardization and schema mapping. For the specific problem of deduplication, most recent work [11, 22] has concentrated on the performance aspects assuming that the deduplication function is input by the ....
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model and algorithms. In Proc. of the 27th Int'l Conference on Very Large Databases (VLDB), pages 307--316, Rome,Italy, 2001.
....The issues addressed consist of detecting and solving conflicts inside the database, and conflicts between answers to questionnaires and the intended, declarative semantics of the latter, as opposed to conflicts between data and integrity constraints. This work is a specific case of data cleaning [44]. It has been widely recognized that in database integration the integrated data may be inconsistent with the integrity constraints. A typical (theoretical) solution to the problem of database inconsistency in this context is to augment the data model to represent disjunctive information. ....
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C-A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In International Conference on Very Large Data Bases, pages 371--380, 2001.
....of different values. Next, field level similarity metrics need to be combined to determine overall similarity between two records. The second challenge is to provide user friendly interactive tools for users to specify different transformations, and use the feedback to improve the data quality [3, 5, 15]. The third challenge is that of scale. Often it is computationally prohibitive to apply a nested loop approach to use the similarity function(s) to compute the distance between each pair of records. This issue has previously been studied in [7] which proposed an approach that first merges two ....
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.- A. Saita. Declarative data cleaning: Language, model, and algorithms. In VLDB, pages 371--380, 2001.
....(EM) methods and support vector machines have been proposed in the literature [4, 7, 35] The second challenge in record linkage is to provide user friendly interactive tools for users to specify di#erent transformations, and use the feedback to improve the data quality. Recently a few works [11, 13, 30] have been conducted to solve this problem. The third challenge is that of scale. A simple solution is to use a nested loop approach to generate the Cartesian product of records, and then use the similarity function(s) to compute the distance between each pair of records. This approach is ....
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative data cleaning: Language, model, and algorithms. In VLDB, pages 371--380, 2001.
....same object and can be merged. In the absence of such an ID, object identification techniques can be employed [13] These techniques find duplicates automatically by evaluating the data that is available. Even though no method performs with perfect accuracy, satisfying results have been reported [7,14,5]. Ensuring correctness is, in effect, the problem of grouping tuples about the same real world object into a single tuple in the merged result. Our approach. In this paper we propose declarative merging of relational data by means of the SQL query language, so that merging can be performed by ....
....grouping) and it extends the SQL language to solve the problem of resolving conflicts among duplicates (user defined aggregation) Once these extensions are part of a DBMS and its optimizer, they harmonize with the findings here. Galhardas et al. present a framework for declarative data cleaning [5], implemented as the Ajax tool [1] The framework encompasses all steps of data cleaning including operators for object identification and data merging. The authors suggest a proprietary merge operator grouping tuples over ID attributes and applying functions to resolve conflicts. Details of the ....
Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon and Cristian Saita. Declarative data cleaning: language, model, and algorithms. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 371-380, Rome, Italy, 2001.
....value in doing so diminishes quickly as other aspects of KDD are weakly addressed and the analyst s goal lies in a section of the problem space. We therefore agree with [16, 18] to call for a focus on other aspects of KDD to realize the goal of data mining. Some works has been observed recently [3, 9, 33] and CrystalBall s contribution in this aspect is to eliminate the need for engineering new algorithms. Given the engine and the variant specification language, new parameters can be considered and compositions developed. This in turn eliminates programming, frees the analyst from the restrictions ....
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and CA. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In Proc. of VLDB, Italy, 2001.
....while Bob s records would have three purposes: purchase, registration and recommendations. The set of purposes combined with the information in the privacy authorizations table will be used to restrict access. Data Preprocessing The Data Accuracy Analyzer may run some data cleansing functions [19] [41] against the data to check for accuracy either before or after data insertion, thus addressing the Principle of Accuracy. In our example, Alice s address may be checked against a database of street addresses to catch typos in the address or zip code. 4.2.3 Queries Queries are submitted to ....
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative data cleaning: Language, model, and algorithms. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pages 371--380, 2001.
No context found.
Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon, and Cristian-Augustin Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB'01), Rome, Italy, September 2001.
....This section gives a presentation by example of the ve logical operators (mapping, view, matching, clustering, and merging) o ered by our declarative language for expressing data cleaning transformations. A formal description of our operators and the BNF grammar for their syntax can be found in [7]. In [7] we also show how to express data cleaning transformations existing in commercial systems or introduced in the research literature as a composition of our operators. 3.1 Mapping Operator A mapping operator takes a single relation as input and produces one or more relations. It expresses ....
....gives a presentation by example of the ve logical operators (mapping, view, matching, clustering, and merging) o ered by our declarative language for expressing data cleaning transformations. A formal description of our operators and the BNF grammar for their syntax can be found in [7] In [7], we also show how to express data cleaning transformations existing in commercial systems or introduced in the research literature as a composition of our operators. 3.1 Mapping Operator A mapping operator takes a single relation as input and produces one or more relations. It expresses ....
[Article contains additional citation context not shown here]
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.- A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. Extended version of the VLDB'01 paper, 2001.
No context found.
Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon, and Cristian-Augustin Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In VLDB, Rome, Italy, September 2001.
No context found.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model, and algorithms. In International Conference on Very Large Databases (VLDB), pages 371--380, Rome, Italy, 2001.
No context found.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model, and algorithms. In International Conference on Very Large Databases, pages 371--380, Rome, Italy, 2001.
No context found.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model and algorithms. In Proceedings of the 2001.
No context found.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model and algorithms. In Proceedings of the 2001.
No context found.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.A. Saita. Declarative data cleaning: Language, model, and algorithms. In VLDB 2001.
No context found.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, C.-A. Saita, Declarative data cleaning: Language, model, and algorithms, Proceedings of the 27
No context found.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In Proc. of the 27 VLDB Int. Conf. on Very Large Databases, Roma, Italy, September 2001.
No context found.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In Proc. of the 27 VLDB Int. Conf. on Very Large Databases, Roma, Italy, September 2001.
No context found.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, C.-A. Saita, Declarative data cleaning: Language, model, and algorithms, Proceedings Roma, Italy, 2001
No context found.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In Proc. of the 27 Int. Conf. on Very Large Databases, Roma, Italy, September 2001.
No context found.
Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon, and Cristian Saita. Declarative data cleaning: Language, model, and algorithms. In Proceedings of the 27th International Conference on Very Large Databases, pages 371-- 380, Roma, Italy, September 11-14 2001.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC