#### DMCA

## Improving Data Quality: Consistency and Accuracy

### Cached

### Download Links

- [www.vldb.org]
- [dc-pubs.dbs.uni-leipzig.de]
- [dc-pubs.dbs.uni-leipzig.de]
- DBLP

### Other Repositories/Bibliography

Citations: | 72 - 15 self |

### Citations

1958 | The Probabilistic Method
- Alon, Spencer
- 2000
(Show Context)
Citation Context ... attribute values to be modified, and the “closeness” of the new value to the original value. Following the practice of US national statistical agencies [13, 35], we assume that a weight in the range =-=[0, 1]-=- is associated with each attribute A of each tuple t in the dataset D, denoted by w(t, A) (see the wt rows in Fig. 1(a)). The weight reflects the confidence of the accuracy placed by the user in the a... |

361 | Consistent Query Answers in Inconsistent Databases.
- Arenas, Bertossi
- 2003
(Show Context)
Citation Context ...ith this comes the need for effective methods to improve the quality of data, or to clean data. Inconsistencies, errors and conflicts in a database often emerge as violations of integrity constraints =-=[2, 29]-=-. A central problem for data cleaning is how to make the data consistent: given a dirty database D, we want to minimally edit the data in D such that it satisfies certain constraints. In other words, ... |

140 | Approximating Maximum Independent Sets by Excluding Subgraphs
- Boppana, Halldorsson
- 1992
(Show Context)
Citation Context ... of D ′ such that C |= Σ. ✷ Proof sketch: This is verified by reduction from the independent set problem, which is NP-complete (cf. [17]). ✷ Greedy algorithms do provide some approximation guarantees =-=[7]-=- for finding such a set C. However, unless for each CFD ϕ ∈ Σ the number of tuples that violate ϕ with another tuple is bounded by a small constant, the approximation factor grows with the size of the... |

127 | Minimal-Change Integrity Maintenance Using Tuple Deletions
- Chomicki, Marcinkowski
- 2005
(Show Context)
Citation Context ...guaranteeing that |dif(Repr, Dopt)|/|Dopt| is within a predefined bound ɛ. Here dif counts the attribute-level differences between two databases. There has been a host of work on data cleaning (e.g., =-=[2, 5, 25, 10, 14, 34]-=-). However, to develop practical data-cleaning tools there is much more to be done. First, the previous work often models the consistency of data using traditional dependencies, e.g., functional depen... |

100 | A cost-based model and effective heuristic for repairing constraints by value modification
- Bohannon, Flaster, et al.
- 2005
(Show Context)
Citation Context ...guaranteeing that |dif(Repr, Dopt)|/|Dopt| is within a predefined bound ɛ. Here dif counts the attribute-level differences between two databases. There has been a host of work on data cleaning (e.g., =-=[2, 5, 25, 10, 14, 34]-=-). However, to develop practical data-cleaning tools there is much more to be done. First, the previous work often models the consistency of data using traditional dependencies, e.g., functional depen... |

61 | Conditional functional dependencies for data cleaning
- Bohannon, Fan, et al.
- 2007
(Show Context)
Citation Context ...h the size of the database [19]. A simpler approach is to compute the set C ′ of tuples that do not violate any constraint in Σ. This clearly does not gives us a maximal set of tuples but as shown in =-=[6]-=- it can be efficiently computed using SQL queries. Moreover, in practice one can often expect this set to be fairly large. Indeed, the typical error rate of real-world data in enterprises is 1%–5% [31... |

51 | Constraint-Generating Dependencies
- Baudinet, Chomicki, et al.
- 1999
(Show Context)
Citation Context ... our algorithms correctly fix noises, they may also introduce new noises. This is an issue not yet well studied by previous work. 8. Related Work A variety of constraint formalisms have been proposed =-=[6, 4, 8, 26, 27]-=-. Except for [6], these formalisms have not been applied in the context of data cleaning. CFDs are proposed in [6], which studies satisfiability and implication analyses of CFDs, and gives SQL techniq... |

17 | DISTANCE-SAT: complexity and algorithms
- BAILLEUX, MARQUIS
- 1999
(Show Context)
Citation Context ...oblem is NP-complete. Moreover, it remains intractable if one considers standard FDs only. ✷ Proof sketch: The NP-hardness is verified by reduction from the distance-SAT problem, which is NP-complete =-=[3]-=-. That is to determine, given a propositional logic formula φ, an initial truth assignment ρ1, and a constant k, whether there exists a truth assignment 321 Procedure TUPLERESOLVE(t, Repr, Σ) Input: A... |

15 | Errors detection and correction in large scale data collecting
- Bruni, Sassano
(Show Context)
Citation Context ...rch on constraint-based data cleaning has mostly focused on two topics introduced in [2]: repair is to find another database that is consistent and minimally differs from the original database (e.g., =-=[2, 5, 25, 9, 10, 14]-=-); and consistent query answer is to find an answer to a given query in every repair of the original database (e.g., [2, 10, 24, 34]). Most earlier work (except [5, 9, 14, 34]) considers traditional f... |

12 |
Conditional dependencies for horizontal decompositions
- Bra, Paredaens
- 1983
(Show Context)
Citation Context ... our algorithms correctly fix noises, they may also introduce new noises. This is an issue not yet well studied by previous work. 8. Related Work A variety of constraint formalisms have been proposed =-=[6, 4, 8, 26, 27]-=-. Except for [6], these formalisms have not been applied in the context of data cleaning. CFDs are proposed in [6], which studies satisfiability and implication analyses of CFDs, and gives SQL techniq... |