24 citations found. Retrieving documents...
L. A. Goodman and W. H. Kruskal. Measures of association for cross classifications. J. of Amer. Stat. Assoc., 49:732--764, 1954.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Characterization of Unsupervised Clusters - With The Simplest   (Correct)

....numbers of clusters, etc [RF01a] These properties are checked by measures which evaluate the link between the two partitions X and Y upon a same set on the basis of a contingency table. In Double Clust, the objective function is the asymmetrical measure designed by Goodman and Kruskal [GK54]. It is evaluated on a co occurrence table (p ij ) It has been shown in [RF01a] that this measure discriminate well, even in a noisy context, the set of bipartitions regarding the intensity of the functional link existing between the both partitions. p ij is the frequency of relations between an ....

L. A. Goodman and W. H. Kruskal. Measures of association for cross classification. Journal of the American Statistical Association, 49:732--764, 1954.


Using Horizontal-Vertical Decompositions to Improve .. - Giannella..   (Correct)

....considered the concept of an FD approximately holding in an instance and have developed measures to characterize the degree of approximation . Piatetsky Shapiro [26] describe a measure derived from probabilistic considerations (this measure corresponds to the measure of Goodman and Kruskal [8]) Kivinen and Mannila [16] propose and evaluate three different measures derived from pragmatic considerations. One of their measures, g 3 , correlates with the idea of correction that we use. Huhtala et al. 11] develop an algorithm for efficiently discovering all AFDs in a given instance ....

Goodman L. and Kruskal W. Measures of associations for cross classifications. Journal of the American Statistical Association 49 (1954), 732--764.


Measures of Distinctness for Random Partitions and Compositions .. - Hwang, Yeh (1997)   (4 citations)  (Correct)

.... of measures of association for data is an important issue in many quantitative problems arising in diverse disciplines such as political science, psychology and sociology; and in rank statistics, Kendall s and Spearman s ae are commonly used measures of association (or disarray) of data; cf. [11, 22, 29]. In probability theory, the Kolmogorov distance and the total variation distance between two distributions are frequently used measures of closeness. This paper is concerned with problems of the following type: Accepted for publication in Advances in Applied Mathematics. Given a random (under ....

L. A. Goodman and W. H. Kruskal. Measures of association for cross classifications . Springer Series in Statistics, 1. Springer-Verlag, New York, (1979).


Characterization of Unsupervised Clusters With the .. - Robardet..   (Correct)

....different numbers of clusters, etc [RF01a] These properties are checked by measures which evaluate the link between the two partitions upon a same set on the basis of a contingency table. In Double Clust, the objective function is the asymmetrical measure designed by Goodman and Kruskal [GK54]. It is evaluated on a co occurrence table . It has been shown in [RF01a] that this measure discriminate well, even in a noisy context, the set of bipartitions regarding the intensity of the functional link existing between the both partitions. is the frequency of relations between an ....

L. A. Goodman and W. H. Kruskal. Measures of association for cross classification. Journal of the American Statistical Association, 49:732--764, 1954.


Measures of Distinctness for Random Partitions and Compositions .. - Hwang, Yeh (1997)   (4 citations)  (Correct)

.... of measures of association for data is an important issue in many quantitative problems arising in diverse disciplines such as political science, psychology and sociology; and in rank statistics, Kendall s # and Spearman s # are commonly used measures of association (or disarray) of data; cf. [11, 22, 29]. In probability theory, the Kolmogorov distance and the total variation distance between two distributions are frequently used measures of closeness. This paper is concerned with problems of the following type: # Accepted for publication in Advances in Applied Mathematics. Given a random (under ....

L. A. Goodman and W. H. Kruskal. Measures of association for cross classifications . Springer Series in Statistics, 1. Springer-Verlag, New York, (1979).


Automatic Noun Classification by Using Japanese-English Word Pairs - Inoue (1991)   (1 citation)  (Correct)

....but some semantically separated clusters are acquired if the threshold is used. It is possible to compare clusters derived from this experiment with semantic categories which are used in our automatic interpreting telephony system. We used expression (2) which was defined by Goodman and Kruskal[Goodman], in order to objectively compare them. list) form) material) hope) document) abstract) program) An example of the classification of nouns iJ(decision) presentation) 5 (speech) talk) X F (slide) l, draft) 8 (Sth) l(station) 205 Table 2 ] ....

Goodman, L. A., and Kruskal W.H. "Measures of Association for Cross Classifications", J. Amer. Statist. Assoc. 49,


On an Information Theoretic Approximation Measure for.. - Giannella, Robertson   (Correct)

....differences are described in section 5. Piatetski Shapiro defines probabilistic data dependencies in [13] Based on probabilistic data dependencies he goes on to define a normalized measure of association which corresponds to the association measure proposed previously by Goodman and Kruskal [3]. can be used to define an approximation measure for AFDs (although not discussed in [3] or [13] This measure behaves quite differently than our information theoretic measure and g 3 . These differences are described in section 5. Three different approximation measures for AFDs are defined in ....

.... dependencies in [13] Based on probabilistic data dependencies he goes on to define a normalized measure of association which corresponds to the association measure proposed previously by Goodman and Kruskal [3] can be used to define an approximation measure for AFDs (although not discussed in [3] or [13] This measure behaves quite differently than our information theoretic measure and g 3 . These differences are described in section 5. Three different approximation measures for AFDs are defined in [6] and called error measures) The definitions of these three measures are based ....

[Article contains additional citation context not shown here]

Goodman L. and Kruskal W. Measures of associations for cross classifications. Journal of the American Statistical Association, 49:732--764, 1954.


An Information-Theoretic External Cluster-Validity Measure - Dom (2001)   (10 citations)  (Correct)

....jCj Gamma1 Delta Q 0 (C; K) 18) Clearly Q 2 = 0 can only be approached as n 1. 5. A Survey of Other External Validity Measures An extensive review of related association measures prior to 1959 including work in the late 19th century can be found in two papers by Goodman and Kruskal[5, 6]. 5.1. Classification Error An external measure that is sometimes used in cases where the number of clusters is equal to the number of classes is classification error. If the rows and columns of H are made to correspond by associating the majority class in each cluster with the cluster itself, ....

L. A. Goodman and W. H. Kruskal. Measures of association for cross classification. Journal of the American Statistical Association, 49:732--764, December 1954.


Structural analysis of conserved base-pairs in protein-DNA.. - Mirny, Gelfand (2000)   (Correct)

.... Gamma X x=A;C;G;T f i (x) log f i (x) 1) where f i (x) is a frequency of nucleotide x in position i of the site. Next we compute the correlation between S and n using three different measures: the linear correlation coefficient r, 2 association [25] and 2 Theta 2 association measure fl [26]. The correlation coefficient measures the degree of linear correlation between S and n, while 2 and fl can identify a non linear association between the variables. For all three measures we compute statistical significance P r , P 2 , P fl as the probability of observed association under the ....

.... ae 11 = number of positions with S i S cut and n i n cut ae 12 = number of positions with S i S cut and n i n cut ae 21 = number of positions with S i S cut and n i n cut ae 22 = number of positions with S i S cut and n i n cut (3) Then the association between S and n is measured as [26] fl = ae 11 ae 22 Gamma ae 12 ae 21 ae 11 ae 22 ae 12 ae 21 (4) Results Table 1 summarizes results for all five proteins. Strikingly, for all proteins except MetJ a strong negative correlation between the variability and the number of proteinnucleotide interactions is observed. In other ....

LA Goodman and WH Kruskal. Measures of association for cross classifications. Springer series in statistics. Springer-Verlag, New York, (1979).


Confirmation-guided discovery of first-order rules with Tertius - Flach, LACHICHE (2000)   (2 citations)  (Correct)

....the strength of the dependency between A and B is considered somewhat dubious by many statisticians. On the other hand, since there is no generally agreed upon measure of dependency, Phi 2 or a variant thereof does not seem less acceptable than many of the other measures proposed over the years [12]. CONFIRMATION GUIDED DISCOVERY OF FIRST ORDER RULES WITH Tertius 31 Table 5. A comparison of evaluation measures for rule discovery. 1) g(p Gamma p 0 ) HB Gamma p HB (2) p g(p Gamma p 0 ) HB Gammap HB p p B (3) p g 1 Gammag (p Gamma p0 ) HB Gammap HB p p B (1 Gammap B ....

L. A. Goodman and W. H. Kruskal. Measures of association for cross classifications. Springer-Verlag, 1979.


What evolution can tell us about protein-DNA interactions - Mirny   (Correct)

....[23] at each position as S i = Gamma X x=A;C;G;T f i (x) log f i (x) 1) Where f i (x) is a frequency of nucleotide x in position i in the site. Next we compute correlation between S and n. We use both traditional linear correlation coefficient r [24] and a 2 Theta 2 association measure fl [25]. The 2 Theta 2 measure is used to compute association between categorical variables. To use it, we classify positions as being variable (S i S cut ) vs conserved (S i S cut ) and as strongly involved (n i n cut ) vs slightly involved (n i n cut ) into interactions with the protein. To ....

.... 11 = number of positions with S i S cut and n i n cut ae 12 = number of positions with S i S cut and n i n cut ae 21 = number of positions with S i S cut and n i n cut ae 22 = number of positions with S i S cut and n i n cut (2) Then the association between S and n is measured as [25] fl = ae 11 ae 22 Gamma ae 12 ae 21 ae 11 ae 22 ae 12 ae 21 (3) Table I summarizes results for all five proteins. Strikingly, for all proteins except MetJ strong negative correlation is observed. This indicates that base pairs that have more interactions with the protein n are more ....

Goodman, L. and Kruskal, W. Measures of association for cross classifications. Springer series in statistics. Springer-Verlag, New York, (1979).


Using Symbolic Data to Improve Connectionist.. - Bonnet, Perrault..   (Correct)

....of the N symbolic explicative variables may be deduced from the others. To avoid this redundancy we have to find a way to measure the improvement of the prediction accuracy when adding a new symbolic explicative variable. Goodman and Kruskal s B provides such results with two symbolic variables [5]. Let vB (t) denotes a given symbolic variable 12 . It measures the improvement in the accuracy of the prediction of vB (t) when using another symbolic variable vA (t) relative to using a random variable. If the B is null, the two variables are independent on the chosen sample. An extension of ....

L.A. Goodman and W.H. Kruskal, "Measures of association for cross-classification," Journal of the Amer. Statist. Assoc., vol. 49, pp. 732--764, 1954.


Comparing Classification Tree Structures : A Special Case .. - Israel-Cesar Lerman Mai   (Correct)

....not enough attention is paid in order to intimately take into account the specific structure of the compared relations. Thus, the reduction done in the Fowlkes and Mallows (1983) paper, for comparing two classification trees cannot be clearly justified. On the other hand, Baker (1974) uses the Goodman Kruskal coefficient (1954) for this aim. However, the generality of this coefficient makes it not enough accurate for the concerned structures. The general method we set up (Lerman 1992) has its origin in the K. Pearson and M. G. Kendall contributions. It meets Hubert s work (1987) and makes comprehensive a large family ....

....disconnection is made by this technique between the different level partitions of a same tree. The second criticism is about producing a global coefficient B(ff; fi) summarizing the sequence (21) by means of a non arbitrary function f : B(ff; fi) f(B l j1 l n Gamma 1) 24) The well known Goodman and Kruskal coefficient (1954) gives a global comparison of two total preorders on a finite set. And then, it can be used for comparing ultrametric preordonnances associated with trees [see (19) since an ultrametric preordonnance is a specific total preorder on P [see (13) In order to clearly set up the nature of this ....

GOODMAN, L.A., and KRUSKAL, W.H. (1954), "Measures of association for cross classification ", Journal of the American Statistical Association, 49, 732-764.


An Application of Cluster Analysis to Health Services.. - Sugar, Lenert, Olshen (1999)   (2 citations)  (Correct)

....for consecutive values of k using a statistic that we denote by G ij : This statistic is designed to find systematic patterns of covariation in factor data, much as the squared correlation coefficient does for continuous data; it is useful especially when these patterns are not visually obvious. (Goodman and Kruskal, 1979). Specifically, the G ij statistic measures the drop in prediction error of one variable given knowledge of a second. In this case, we consider two consecutive clustering schemes highly correlated if knowing to which cluster a point is assigned in the first scheme makes it easier to predict to ....

Goodman, L.A., and W.H. Kruskal. 1979 Measures of Association for Cross Classification. New York: Springer-Verlag.


Application of Sampling Methodologies to Network Traffic.. - Claffy, Polyzos, Braun (1993)   (33 citations)  (Correct)

....at the 0.05 confidence level. Unfortunately, the 2 statistic is sensitive to the size of the data set, making it difficult to compare samples of varying sizes. Therefore, it cannot quantify significant trends when varying the sampling fraction, one of our primary concerns. Goodman and Kruskal [10] note that although useful as a test for the significance of the association between two data sets, the 2 statistic, or any simple function of it (e.g. the significance level) cannot serve as a measure of degree of association between two sets. On the other hand, we did find significantly ....

L. Goodman and W. Kruskal. Measures of association for cross classifications. Journal of the American Statistical Association, pages 732--763, December 1954.


Internet Traffic Characterization - Claffy (1994)   (24 citations)  (Correct)

....at the 0.05 confidence level. Unfortunately, the 2 statistic is sensitive to the size of the data set, making it difficult to compare samples of varying sizes. Therefore, it cannot quantify significant trends when varying the sampling fraction, one of our primary concerns. Goodman and Kruskal [98] note that although useful as a test for the significance of the association between two data sets, the 2 statistic, or any simple function of it (e.g. the significance level) cannot serve as a measure of degree of association between two sets. On the other hand, we did find significantly ....

L. Goodman and W. Kruskal, "Measures of association for cross classifications," Journal of the American Statistical Association, pp. 732--763, December 1954.


Attribute Similarity and Event Sequence Similarity in Data Mining - Ronkainen (1998)   (4 citations)  (Correct)

....fixes all the other cell values. This means that only one cell value can be assigned at will, and therefore, we say that any internal measure of similarity has 1 degree of freedom. There are, of course, numerous ways of defining measures for the strength of association between attributes; see [GK79] for some possible measures. One of the possibilities is the 2 test statistic, which measures the deviation between the observed and expected values of the cells in the contingency table under the independence assumption. In the case of two binary attributes, the 2 test statistic is ....

....the index i describes the values of the attribute A and the index j the values of the attribute B. As a measure of association between attributes we could also use any of the many modifications of the 2 test statistic, like Yule s, Pearson s or Tschuprow s coefficients of association [YK58, GK79] The 2 test statistic determines whether two attributes A and B are independent or not. If the attributes are independent, the value of the measure is 0. When the value of 2 is higher than a cutoff value, the attributes are considered to be somehow dependent on each other. The cutoff ....

[Article contains additional citation context not shown here]

L. A. Goodman and W. H. Kruskal. Measures of Association for Cross Classifications. Springer-Verlag, Berlin, Germany, 1979.


Similarity of Attributes by External Probes - Gautam Das (1997)   (14 citations)  (Correct)

....depends only on the values on the A and B columns of r. As there are only 4 possible value combinations, 1 we can express the sufficient statistics for any internal measure by the familiar 2 by 2 contingency table. We can measure the strength of association between A and B in numerous ways; see (Goodman Kruskal 1979) for a compendium of methods. Possibilities include the 2 test statistic, which measures the deviation of the observed values from the expected values under the assumption of independence. There exist several modifications of this measure. If one would like to focus on the positive ....

....number of rows in the whole dataset r. To obtain the distance measure we sum over all the probes D: d ;P (A; B) P D2P F (A; B; D) This measure is 2 distributed with jP j degrees of freedom. One might be tempted to use d ;P or some similar notion as a measure of similarity. However, as (Goodman Kruskal 1979) puts it, The fact that an excellent test of independence may be based on 2 does not at all meant that 2 , or some simple function of it, is an appropriate measure of degree of association. One well known problem with the 2 measure is that it is very sensitive to cells with small ....

Goodman, L. A., and Kruskal, W. H. 1979. Measures of Association for Cross Classifications. Springer-Verlag.


Similarity of Attributes by External Probes - Das, Mannila, Ronkainen (1997)   (14 citations)  (Correct)

....statistics for any internal measure by the familiar 2 by 2 contingency table: B = 1 B = 0 sum A = 1 n 11 n 10 n 1 Delta A = 0 n 01 n 00 n 0 Delta sum n Delta1 n Delta0 n Here n Delta1 = n 11 n 01 etc. We can measure the strength of association between A and B in numerous ways; see [9] for a compendium of methods. Possibilities include the 2 test statistic, measuring the deviation of the observed values n ij from the expected values under the assumption of independence. Note that this measure is not oblivious to double zeroes, for obvious reasons. There exist several ....

....D2P F (A; B;D) and this will be 2 distributed with jP j degrees of freedom. If the value F P (A; B) is large, then A and B are not behaving in the same way with respect to the attributes in P . One might be tempted to use F P or some similar notion as an measure of similarity. However, as [9] puts it, The fact that an excellent test of independence may be based on 2 does not at all meant that 2 , or some simple function 4 or we have forgotten the test statistics of it, is an appropriate measure of degree of association. 9] goes on to use different models based on ....

[Article contains additional citation context not shown here]

L. A. Goodman and W. H. Kruskal. Measures of Association for Cross Classifications. Springer-Verlag, 1979.


Evaluating Strategies for Similarity Search on the Web - Haveliwala, Gionis, Klein.. (2002)   (20 citations)  (Correct)

No context found.

L. A. Goodman and W. H. Kruskal. Measures of association for cross classifications. J. of Amer. Stat. Assoc., 49:732--764, 1954.


Comparing and Aggregating Rankings with Ties - Fagin, Kumar, Mahdian.. (2003)   (2 citations)  (Correct)

No context found.

L. A. Goodman and W. H. Kruskal. Measures of association for cross classification. Journal of the American Statistical Association, 49:732--764, 1954.


Evaluating Strategies for Similarity Search on the Web - Haveliwala, Gionis, Klein.. (2002)   (20 citations)  (Correct)

No context found.

L. A. Goodman and W. H. Kruskal. Measures of association for cross classifications. J. of Amer. Stat. Assoc., 49:732--764, 1954.


Analysis of Local or Asymmetric Dependencies in Contingency.. - Bernard   (Correct)

No context found.

GOODMAN, L. A., AND KRUSKAL, W. H. Measures of association for cross classifications. II: Further discussion and references. J. Amer. Statist. Assoc. 54 (1959), 123--163.


Reinterpreting the Category Utility Function - Boris Mirkin Mirkin (2001)   (1 citation)  (Correct)

No context found.

Goodman, L.A. & Kruskal, W. (1954). Measures of Association for Cross Classifications. Journal of American Statistical Association, 49, 732-764.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC