Download:
|
by Chris Giannella, Edward Robertson
ftp://ftp.cs.indiana.edu/pub/techreports/TR555.ps.Z
Add To MetaCart
Abstract:
Abstract. We investigate the problem of defining an approximation measure for functional dependencies (FDs). For fixed sets of attributes, X and Y, an approximation measure is a function which maps relation instances to real numbers. The number to which an instance is mapped, intuitively, describes the strength of the dependency, X! Y, in that instance. We define an approximation measure for FDs based on a connection between Shannon's information theory and relational database theory. Our measure is normalized to lie between zero and one (inclusive), and maps a relation instance to zero if and only if X! Y holds in the instance. Hence, the smaller the number to which an instance is mapped, the "closer " X! Y is to being an FD in the instance. To put our measure in context, we compare it to a slight variation of a measure previously defined by Kivinen and Mannila, g3. We denote the variation as g3, although, our results, essentially, apply unchanged to g3. The purpose of comparing our measure with g3 is to develop a deeper understanding of not only our measure, but also, g3. Moreover, we gain a deeper understanding of the natural intuitive notion of an approximate FD. We observe that our measure and g3 agree at their extremes but are quite different in-between. As a result, we conclude that our measure and g3 are significantly different. An interesting question emerges from this conclusion: is there a rigorous way to determine when one measure better captures the meaning of the degree to which an FD is approximate? 1
Citations
|
85
|
The theory of probabilistic databases
– Cavallo, Pittarelli
|
|
62
|
Measures of association for cross classifications
– Goodman, Kruskal
- 1954
|
|
46
|
Approximate inference of functional dependencies from relations
– Kivinen, Mannila
- 1995
|
|
33
|
Algorithms for inferring functional dependencies
– Mannila, Raiha
- 1994
|
|
26
|
Discovering functional and inclusion dependencies in relational databases
– Kantola, Mannila, et al.
- 1992
|
|
26
|
Dependency inference
– Mannila, Raiha
- 1987
|
|
23
|
Information dependencies
– Dalkilic, Robertson
- 2000
|
|
20
|
TANE: An efficient algorithm for discovering functional and approximate dependencies
– Huhtala, Kärkkäinen, et al.
- 1999
|
|
12
|
Efficient discovery of functional dependencies and armstrong relations
– Lopes, Petit, et al.
- 2000
|
|
9
|
Some analytic tools for the design of relational database systems
– Nambiar
- 1980
|
|
8
|
Database Management Systems Second Edition
– Ramakrishnan, Gehrke
- 2000
|
|
8
|
Measures of association for cross classi cations
– Goodman, Kruskal
- 1954
|
|
7
|
Fun: an efficient algorithm for mining functional and embedded dependencies
– Novelli, Cicchetti
- 2001
|
|
6
|
Theory of random observables in relational databases
– Malvestuto
- 1983
|
|
4
|
Fastfds: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances
– Wyss, Giannella, et al.
- 2001
|
|
3
|
FUN: an e cient algorithm for mining functional and embedded dependencies
– Novelli, Cicchetti
|
|
3
|
FastFDs: A heuristic-driven, depth- rst algorithm for mining functional dependencies from relation instances
– Wyss, Giannella, et al.
- 2001
|
|
2
|
TANE: An E cient Algorithm for Discovering Functional and Approximate Dependencies
– Huhtala, Karkkainen, et al.
- 1999
|
|
1
|
Probabilistic data dependencies
– Piatatsky-Shapiro
- 1992
|