| Hoaglin, D. C., Mosteller, F., Tukey, J. W. (1983). (Eds.) Understanding Robust and Exploratory Data Analysis. Wiley, N. Y. |
....that distinguish different modes of behavior, functional simplification of lowdimensionality relationships, and two waytablessuchascontingency tables. These are just a few simple examples# a sophisticated analysis combines these and many others in the construction of a global picture of the data [Tukey77, Hoaglin83]. We are developing an assistant for intelligent data exploration, aide, to assist human analysts with EDA [StAmant95] Aide takes a script based planning approachtoEDA. Data directed mechanisms extract simple observations and suggestive indications from the data. Scripted combinations of ....
David C. Hoaglin, Frederick Mosteller, and John W. Tukey. Understanding robust and exploratory data analysis.Wiley,1983.
....we need to explore the data. We need to identify suggestive features of the data, interpret the patterns these features indicate, and generate hypotheses to explain the patterns. Successive steps through the process lead us gradually to a better understanding of underlying structure in the data [11, 6]. Exploratory data analysis (EDA) 16] gives us a powerful set of operations for this process: we fit linear and higher order functions to relationships# we compose and transform variables with arithmetic functions# we separate relationships into partitions and clusters# we extract features ....
David C. Hoaglin, Frederick Mosteller, and John W. Tukey. Understanding robust and exploratory data analysis. Wiley, 1983. 12
....of cliques that belong to a neighborhood system; clique potential function; differential operator that models the dependency of the pixels in the clique. The operator is usually chosen such that the prior distribution models the smoothness of the image. Robust forms of potential functions [11] [12] are used to impose weak continuity on the solution and to allow for edges in the MAP estimate. Using the conditional probability in (4) and the prior probability in (6) the MAP estimation problem in (3) becomes (7) In [2] the Huber function is used as the potential function. In [4] a set of ....
....potentially be implemented with byte buffers for the size sliding window. Currently, the algorithms are implemented with size buffers for an image. In some MAP estimation based approaches, prior knowledge of smoothness is modeled by the Gibbs distribution with a so called redescending function [12] as a potential function. When a redescending function is used to model , the feasible image set should be enforced strictly [3] Otherwise, the artifact removal process might introduce another type of artifact called the staircase effect due to the property of redescending functions [20] 22] ....
D. C. Hoaglin, F. Mosteller, and J. W. Tukey, Understanding Robust and Exploratory Data Analysis. New York: Wiley, 1983.
.... because product metrics data typically have a few extreme values that may exert strong influence on the analysis results (e.g. see [43] kj kj k kj kj c n c n c n c Absolute Mean 1 1 An even more robust measure of dispersion that can be used is the median absolute deviation [23], which is: j kj j kj c med c med c med c Absolute edian M 2.3.4 Nearest Neighbor It can be expected that the number of nearest neighbors that are used as the basis for prediction will have an impact on the prediction performance. Previous software engineering studies with ....
D. Hoaglin, F. Mosteller, and J. Tukey (eds.): Understanding Robust and Exploratory Data Analysis. Wiley, 1983.
....in Step 3, pick the median from the slopes of all possible pairs of the cumulative minima: output it as the estimate of 1. Otherwise, the algorithm concludes that there is no skew and outputs zero. The core of Paxson s algorithm is a robust line fitting technique based on robust statistics [18]. It uses the median as a robust estimate for the slope. As mentioned in [46, 47] robust line fitting alone fails in estimating the slope of the trend due to the high variability in OTTs, and that is why the de noised OTTs and cumulative minima are used in his algorithm. 2.5.2 Linear ....
Hoaglin, D., Mosteller, F., and Tukey, J., Eds. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons, 1983.
....results obtained are tested for quality, using a variety of measures. The technique scales to large datacubes and proves to give a good approximation of the results that would have been obtained by median polish in the original data. 1 Introduction Exploratory Data Analysis (EDA) is a technique [7, 8] that uncovers structure in data. EDA is performed without any a priori hypothesis in mind: rather it searches for exceptions of the data values relative to what those values would have been if anticipated by an statistical model. This statistical model fits the rest of the data values rather ....
....multidimensional data abstraction, where aggregated measures of the combinations of dimension values are kept. At any level of aggregation, datacubes can be viewed as multi way tables, that can be subjected to EDA. Two traditional ways of performing EDA in tables are median polish and mean polish [8]. Both methods try to fit a model (additive or multiplicative) by operating on the data table, finding and subtracting medians (means) along each dimension of the table. Starting with one dimension, the method calculates the median (mean) of each row 1 and subtracts this value from every ....
[Article contains additional citation context not shown here]
D.C. Hoaglin, F. Mosteller, and J.W. Tukey. Understanding Robust and Exploratory Data Analysis. Wiley, 1986.
....sharp changes of orientation according to the function ae oe . Ifae oe (j) j 2 , the functional is the same as used by Horn and Brooks [1] However, any other function may be used as the regularization term, and we have investigated several robust measures, including the classical Tukey [5] and Huber [9] and the Adaptive Prior Potential Functions of Li [13] We also introduced [25] a continuous version of the piecewise Huber robust estimator, ae oe (j) oe log cosh ; j oe Delta , and found that this yielded the best results by offering a compromise between oversmoothing ....
D. Hoaglin, F. Mosteller, and J. Tukey. Understanding robust and exploratory data analysis. Wiley, New York, 1983.
....also expected to be incorrect, although that is not always the case. One aim of statistical research is to find ways to weaken the assumptions necessary for good estimation. Robust statistics looks for estimators that work satisfactorily for larger families of distributions; resilient statistics [3] concern estimators often order statistics that typically have small errors when assumptions are violated. A more Bayesian approach to the problem of estimation under assumptions emphasizes that alternative models and their competing assumptions are often plausible. Rather than making an ....
Hoaglin, D., Mosteller, F., and Tukey, J. Understanding Robust and Exploratory Data Analysis. Wiley, New York, 1983.
....This representation scheme is translation, orientation and scale independent. During the semantic conversion process, each feature point of a feature image is considered as a reference point to determine the orientation or the principal axes of the feature image which is found using the method of [12]. Once the orientation of the principal axes and the centroid of feature image are available, the spatial positions of the individual feature points, with respect to the new origin and coordinate axes, can be computed (Figure 2) 3. Indexing and Image Retrieval Based on earlier mentioned ....
D. C. Hoaglin, F. Mosteller, and J. W. Tukey. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons, Inc., 1983.
....This representation scheme is translation, orientation and scale independent. During the semantic conversion process, each feature point of a feature image is considered as a reference point to determine the orientation or the principal axes of the feature image which is found using the method of [17]. Once the orientation of the principal axes and the centroid of feature image are available, the spatial positions of the individual feature points, with respect to the new origin and coordinate axes, can be computed (Figure 4) a (x 1 y 1 ) b (x 2 y 2 ) c (x 3 y 3 ) d (x 4 y 4 (x 6 y 6 ) e ....
David C. Hoaglin, Fredrick Mosteller, and John W. Tukey. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons, Inc., 1983.
....by modifying the process of updating the registration. This requires a solution to the absolute orientation problem, for which Horn s method provides a common least squares solution. To obtain an M estimate of absolute orientation, we use an iteratively reweighted least squares modification [7, 6] of Horn s method [8] The scale parameter G in Equation (3) is estimated, following Rousseeuw [14] as a function of the parameters by using the median of absolute deviations of the residuals: 1 duals: 9680 45620 : G aP b ced [ fhgWi jlknm o p 5 q rsr tvu 9 . 1 9 ; ....
D. C. Hoaglin, F. Mosteller, and J. W. Tukey, editors. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons, 1983.
....[2] The minimization is via a two stage iterative process (Picard iteration) After initializing the K cluster centroids, first one calculates the distances d kn 2 , followed by cluster membership values u kn , 1 n N, 1 k K, from u kn = K j 1 (d kn d jn ) 2 (m 1) 1 . [3] The second stage consists of updating the cluster centroids v k , using the relationship: v k = N n 1 (u kn ) m x n ) N n 1 (u kn ) m ) 1 , 1 k K. 4] Equations [2] 3] and [4] are iterated alternately until Jm of Eq. 1] converges. The FC variant that we ve ....
....values u kn , 1 n N, 1 k K, from u kn = K j 1 (d kn d jn ) 2 (m 1) 1 . 3] The second stage consists of updating the cluster centroids v k , using the relationship: v k = N n 1 (u kn ) m x n ) N n 1 (u kn ) m ) 1 , 1 k K. 4] Equations [2] [3] and [4] are iterated alternately until Jm of Eq. 1] converges. The FC variant that we ve developed is a significant reformulation of the classical Bezdek algorithm, expressly optimized for computational efficiency. The improvements are at both the coding and algorithmic levels. For example, we ....
[Article contains additional citation context not shown here]
Hoaglin, D.C., Mosteller, F., and Tukey, J.W. Understanding robust and exploratory data analysis. 1983. John Wiley & Sons, Inc., New York.
....obtained are tested for quality, using a variety of measures. The technique scales to large datacubes and proves to give a good approximation of the results that would have been obtained by median polish in the original data. 1 Introduction Exploratory Data Analysis (EDA) is a technique [13, 14, 22] that uncovers structure in data. EDA is performed without any a priori hypothesis in mind: rather it searches for exceptions of the data values relative to what those values would have been if anticipated by an statistical model. This statistical model fits the rest of the data values rather ....
....that specify other aggregation levels, e.g. stores can be grouped individually, by city, by state, etc. At any level of aggregation, datacubes can be viewed as multi way tables, that can be subjected to EDA. Two traditional ways of performing EDA in tables are median polish and mean polish [14]. Both methods try to fit a model (additive or multiplicative) by operating on the data table, finding and subtracting medians (means) along each dimension of the table. Starting with one dimension, the method calculates the median (mean) of each row 1 and subtracts this value from every ....
[Article contains additional citation context not shown here]
D.C. Hoaglin, F. Mosteller, and J.W. Tukey. Understanding Robust and Exploratory Data Analysis. Wiley, 1986.
....one or two RTTs that are much higher than the remainder. These extreme values greatly skew the sample mean and variance, so that the resulting summaries do not accurately reflect typical behavior. To address these sorts of problems, statisticians have developed the field of robust statistics [HMT83]. These are statistics that remain resilient in the presence of extremes, or outliers. One example is use of the median, or 50th percentile, as a statistic for summarizing a distribution s central location, rather than the mean. Unlike the mean, the median is virtually unaffected by the presence ....
....deviation. One other technique we borrow from robust statistics is that of fitting a line to a series of hx; yi points. Techniques such as least squares can be heavily skewed by trying to minimize the 112 distance between the fitted line and any outliers. The technique we use, taken from [HMT83], is to first estimate the slope of the line as the median of all of the pairwise slopes between the different points, and then estimate the intercept as the median of the offset of the y coordinates from a line with the given slope and zero intercept. 9.2 An overview of TCP In this section we ....
D. Hoaglin, F. Mosteller, and J. Tukey, Ed., "Understanding Robust and Exploratory Data Analysis," John Wiley & Sons, 1983.
.... Gamma (t) is bounded. This immediately implies (C) See also Mizera, 1996. Assumptions (i) and (ii) are satisfied, for instance, by the Cauchy log likelihood (u) log(1 u 2 ) Another example is provided by the slash likelihood (for the definition, see Morgenthaler and Tukey, 1991, or Hoaglin, Mosteller and Tukey, 1983). For convex , C) can be verified by observing that (c denotes the limit of at 1) for s; t 0, s t) Gamma (s) Z s t s (u) du ct = Z t 0 (u) du Z t 0 (c Gamma (u) du Z t 0 (u) du Z 1 0 (c Gamma (u) du (t) L provided that R 1 0 (c Gamma (u) du = L ....
Hoaglin, D. C., Mosteller, F. and Tukey, J. W. (eds.) (1983). Understanding Robust and Exploratory Data Analysis. Wiley, New York.
....cluster may partially destroy the structure of other clusters, or we might get bridging fits [33] Fig. 2(a) shows one such noisy data set with two crossing clusters. The algorithm we propose is designed to overcome this drawback. Moreover, all the current algorithms use hard finite rejection [34], i.e. points within an inlier bound are given a weight of 1, and points outside the bound are given a weight of zero. This means that these algorithms do not handle the region of doubt [21] very well. To overcome this problem, we use smooth [34, 21] or fuzzy rejection, where the weight ....
....the current algorithms use hard finite rejection [34] i.e. points within an inlier bound are given a weight of 1, and points outside the bound are given a weight of zero. This means that these algorithms do not handle the region of doubt [21] very well. To overcome this problem, we use smooth [34, 21] or fuzzy rejection, where the weight function drops to zero gradually. 3 The Robust Competitive Agglomeration (RCA) algorithm 3.1 Algorithm Development Let X = fx j j j = 1; Ng be a set of N vectors in an n dimensional feature space with coordinate axis labels (x 1 ; Delta Delta Delta ....
D. C. Hoaglin, F. Mosteller, and Ed. J. W. Tukey, Understanding Robust and Exploratory Data Analysis, Wiley, New York, 1983.
....on first contact with data. One must interpret suggestive features of the data, observe patterns these features indicate, and generate hypotheses to explain the patterns. Successive steps through the process can lead gradually to a better understanding of underlying structure in the data [ Hoaglin et al. 1983; Good, 1983 ] Exploratory data analysis (EDA) encompasses a wide range of statistical tools [ Tukey, 1977 ] Simple exploratory results include histograms that describe discrete and continuous variables, schematic plots that give general characterizations of relationships, partitions of ....
Hoaglin, David C.; Mosteller, Frederick; and Tukey, John W. 1983. Understanding robust and exploratory data analysis. Wiley.
....to handle; anything that looks below the previously described surface makes the description more effective [25, p. v] These strategic considerations guide our application of specific techniques for fitting lines, examining residuals, and so forth. Though most introductions to the field of EDA [27, 11, 12, 6] lay heavy emphasis on appropriate statistical techniques, none slights the importance of the strategies for their use. In modern computer based statistics packages we find a rich set of operations, suitable for almost any EDA application. These systems are nevertheless limited; they are almost ....
....summarize the relationship with respect to Duration by a table of medians, as shown in Table 2. This gives a close view of the behavior of Duration as WindSpeed and PlanType take on different values. We can analyze this behavior in more detail if desired, using a median polish or similar technique [11]. This brief account gives the flavor of EDA. To see how an automated assistant might contribute to the process, consider the following dialog, which describes a detailed portion of the analysis above. User: Select relationship (Effort, Duration) Aide: Effort, Duration) has these indications: ....
David C. Hoaglin, Frederick Mosteller, and John W. Tukey. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons, Inc., 1983.
.... i;j ## # # ## #k# i;j ## # # # In the quadratic case where # ### # # # , this becomes the update equation used by Horn and Brooks [1] However, any other function may be used as the regularization term, and we have investigated several robust measures, including the classical Tukey [5] and Huber [9] and the Adaptive Prior Potential Functions of Li [13] We also introduced [25] a continuous version of the piecewise Huber robust estimator, described by # #### # # ### #### # ## # # (3) and found that this yielded the best results by offering a compromise between ....
D.C. Hoaglin, F. Mosteller, and J.W. Tukey. Understanding robust and exploratory data analysis. Wiley, New York, 1983.
....the situation is also less serious precisely because it involves only a single value. As for the how , data point outliers can be identified through various mechanisms. First, there are arbitrary rules such as 3 SD s. Second, there are procedures, such as those based on box plot analysis (e.g. Hoaglin, Mosteller, Tukey, 1985) or robust statistics (Wilcox, 1997; Huber, 1981) that routinely trim 10 20 percent of the most extreme data points regardless of how extreme they are. Unlike the data point outlier, a case outlier can be identified as soon as there is a case to be identified. For example, a variety of screening ....
Hoaglin, D.C., Mosteller, F. & Tukey, J.W. (1985). Understanding Robust and Exploratory Data Analysis. New York: Wiley.
....13 lower quartile upper quartile upper outlier cutoff upper outliers lower outlier cutoff lower outliers Data (appropriately scaled) minimum maximum median Figure 3.1: A basic boxplot The five number summary of data distribution is often visualized as Boxplot. Figure 3. 1 shows a basic boxplot [28]. It has the following elements: ffl The rectangular box has ends at lower quartile and higher quartile. It covers 50 of the data. ffl The crossbar inside the box is the median. ffl The line from each end of the box extends to an outlier cutoff bar (tail) which is the most remote point that is ....
....resistant to the impact of exceptional data as the median and the quartiles are resistant to wild data values. Because of these advantages, boxplot becomes one of the most favored visualization tools in statistical analysis [14] Boxplots are particularly powerful with data from several groups [28]. We can draw one boxplot for each group and compare them with one another to see how the groups differ. In our multi dimensional data cube, we often want to compare the data from different groupby s. Boxplot is an ideal data analysis tool in such cases. Therefore we would like our data mining ....
[Article contains additional citation context not shown here]
D. Hoaglin, F. Mosteller, and J. Tukey. Understanding Robust and Exploratory Data Analysis. John Wiley and Sons, 1983.
No context found.
Hoaglin, D. C., Mosteller, F., Tukey, J. W. (1983). (Eds.) Understanding Robust and Exploratory Data Analysis. Wiley, N. Y.
No context found.
D. C. Hoaglin, F. Mosteller, and J. W. Tukey, Understanding Robust and Exploratory Data Analysis. New York, New York: John Wiley & Sons, 1983.
No context found.
Hoaglin, D.C., F. Mosteller, and J.W. Tukey, eds. Understanding robust and exploratory data analysis. . 1983, Wiley: New York. 447.
No context found.
Climate, 7, 1001-1013. Hoaglin, D., Mosteller, F., and J. Tukey 1983. Understanding Robust and Exploratory Data Analysis, John Wiley and Sons, New York, 447 pp.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC