MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  On an Information Theoretic Approximation Measure for Functional Dependencies

Download:
Download as a PDF | Download as a PS
by Chris Giannella, Edward Robertson
ftp://ftp.cs.indiana.edu/pub/techreports/TR555.ps.Z
Add To MetaCart

Abstract:

Abstract. We investigate the problem of defining an approximation measure for functional dependencies (FDs). For fixed sets of attributes, X and Y, an approximation measure is a function which maps relation instances to real numbers. The number to which an instance is mapped, intuitively, describes the strength of the dependency, X! Y, in that instance. We define an approximation measure for FDs based on a connection between Shannon's information theory and relational database theory. Our measure is normalized to lie between zero and one (inclusive), and maps a relation instance to zero if and only if X! Y holds in the instance. Hence, the smaller the number to which an instance is mapped, the "closer " X! Y is to being an FD in the instance. To put our measure in context, we compare it to a slight variation of a measure previously defined by Kivinen and Mannila, g3. We denote the variation as g3, although, our results, essentially, apply unchanged to g3. The purpose of comparing our measure with g3 is to develop a deeper understanding of not only our measure, but also, g3. Moreover, we gain a deeper understanding of the natural intuitive notion of an approximate FD. We observe that our measure and g3 agree at their extremes but are quite different in-between. As a result, we conclude that our measure and g3 are significantly different. An interesting question emerges from this conclusion: is there a rigorous way to determine when one measure better captures the meaning of the degree to which an FD is approximate? 1

Citations

85 The theory of probabilistic databases – Cavallo, Pittarelli
62 Measures of association for cross classifications – Goodman, Kruskal - 1954
46 Approximate inference of functional dependencies from relations – Kivinen, Mannila - 1995
33 Algorithms for inferring functional dependencies – Mannila, Raiha - 1994
26 Discovering functional and inclusion dependencies in relational databases – Kantola, Mannila, et al. - 1992
26 Dependency inference – Mannila, Raiha - 1987
23 Information dependencies – Dalkilic, Robertson - 2000
20 TANE: An efficient algorithm for discovering functional and approximate dependencies – Huhtala, Kärkkäinen, et al. - 1999
12 Efficient discovery of functional dependencies and armstrong relations – Lopes, Petit, et al. - 2000
9 Some analytic tools for the design of relational database systems – Nambiar - 1980
8 Database Management Systems Second Edition – Ramakrishnan, Gehrke - 2000
8 Measures of association for cross classi cations – Goodman, Kruskal - 1954
7 Fun: an efficient algorithm for mining functional and embedded dependencies – Novelli, Cicchetti - 2001
6 Theory of random observables in relational databases – Malvestuto - 1983
4 Fastfds: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances – Wyss, Giannella, et al. - 2001
3 FUN: an e cient algorithm for mining functional and embedded dependencies – Novelli, Cicchetti
3 FastFDs: A heuristic-driven, depth- rst algorithm for mining functional dependencies from relation instances – Wyss, Giannella, et al. - 2001
2 TANE: An E cient Algorithm for Discovering Functional and Approximate Dependencies – Huhtala, Karkkainen, et al. - 1999
1 Probabilistic data dependencies – Piatatsky-Shapiro - 1992