| Frederick Mosteller and John W. Tukey. Data Analysis and Regression. AddisonWesley, 1977. |
....about data manipulation, while still allowing specific decisions to take advantage of local context. Controlling Exploration In terms of search, exploration can generally be viewed as a local data driven process. Specific combinations of indications, or suggestivecharacteristics of the data [Mosteller77], lead to appropriate actions being taken. Strong skewinabatchofdatapoints, for example, indicates that a transformation for symmetry may be appropriate. An indication of clustering in a relationship may lead to a consideration of local behavior within each cluster. Findings can establish ....
Frederick Mosteller and John W. Tukey. Data Analysis and Regression. AddisonWesley, 1977.
....description of the behavior of the planner as wind speed and plan type change. This brief account gives the flavor of EDA. Note our use of indications in the analysis. Indications are suggestivecharacteristics of the data, most often involving evaluation of a statistic or descriptive structure [10]. The gap in the distribution of Effort indicates 3 # Replans = 0 1 2 3 clustered Effort 113 0 0 0 non clustered Effort 49 95 45 14 Table 1: Clustered and non clustered Effort for successful trials WindSpeed = Low Medium High PlanType = X 11.0 15.5 18.8 PlanType = Y 14.5 26.7 22.1 PlanType ....
Frederick Mosteller and John W. Tukey. Data Analysis and Regression. Addison-Wesley, 1977.
....to develop statistically valid models, always used in assessing goodness offit. 3. 3 Logarithmic Transformations When analyzing data drawn from distributions unbounded in one direction and bounded in the other, often it helps to re express the data by applying a logarithmic transformation [MT77] We found that for many of our models logarithmic transformations were required to discern patterns in the large range of values in the data. For convenience we developed and tested our models using a log 2 transformation. Note that, when converting from logarithmic models back to ....
F. Mosteller and J. W. Tukey, "Data Analysis and Regression", Addison Wesley, 1977.
....detect outliers by comparing the residual errors of each line with a threshold will not work. Throwing away one line at a time and doing least squares on the remaining subset al..so does not work when more than one outlier is present. Statisticians have suggested many different ;robust techniques [13, 17, 24, 25, 31] to handle outliers and these techniques are currently gaining popularity in computer vision [1, 7, 10, 14] A measure used to analyze these ; robust algorithms is the breakdown point: the smallest fraction of outliers present in the input data which may cause the output estimate to be ....
....bound) and the actual variance provided by the given method , so that the best possible value is 1. Kim et al. also note that ; the least mean square estimator in the presence of gaussian noise has an asymptotic (large sample) efficiency of i while the roedian s efficiency is only 0. 637 [14, 24]. As we shall see, there is a trade off between algorithms with high breakdown points versus those with high efficiency. Finally most research in robust statistics appears to have been done for linear problems. When applying these techniques to non linear problems, another important consideration ....
[Article contains additional citation context not shown here]
Mosteller, F., and J. W. Tukey, Data Analysis and Regression, Addison-Wesley, Reading, MA., 1977.
....main interest was in the use of cross validation to select models. Data splitting could be regarded as a rather crude form of cross validation but that is certainly not our purpose here. We scrupulously wish to hold out a sample of the data that will not be used for model selection in any way. Mosteller and Tukey (1977, p. 37) also discuss data splitting as a form of cross validation, but focus on it as form of model selection which we wish to avoid here. There are three different tasks we need to perform: 1. Selection of the model. 2. Estimation of the parameters of the selected model. Point predictions can ....
....of the selected model. Point predictions can then be made. 3. Assessment of the variability in the predictions. Conceivably we could split the data three ways and use a different part for each of the above tasks. This has been suggested in passing by Miller (1990, p. 13) and is, perhaps, what Mosteller and Tukey (1977) meant by double cross validation. I investigated this strategy and found it clearly inferior to any of those discussed below, so henceforth I restrict attention to splitting the data into two parts. Which parts of the data should be used to perform the three tasks above Most previous authors ....
Mosteller, F. and J. Tukey (1977). Data Analysis and Regression. Addison Wesley.
....in this paper and to make it easier to understand their properties. Much of the material for this paper has been drawn from several general reference books. The three volume set which presents robust and resistant statistics in the context of exploratory data John R. Lanzante 21 analysis (Mosteller and Tukey, 1977; Hoaglin et al. 1983; Hoaglin et al. 1985) is extensive and accessible to anyone with a background in basic applied statistics; in particular, Hoaglin et al. 1983) is especially recommended. Perhaps even more accessible since it is written with meteorologists in mind, although narrower in ....
Mosteller, F., and J. Tukey 1977. Data Analysis and Regression. A Second Course in Statistics, Addison-Wesley, Reading, MA, 588 pp.
.... 36, 50, 61, 62, 81, 82, 84, 116, 117, 118, 129, 143, 144, 148, 200] Tagging [10, 19, 28, 56, 57, 66, 90, 91, 124, 125, 126, 131, 138, 153, 163, 168, 188] HMMs [21, 22, 23, 24, 25, 49, 64, 67, 78, 115, 119, 155, 157, 160, 161] Search [156] The Inside Outside Algorithm [85, 86, 136, 137] Regression [20, 30, 29, 38, 41, 42, 45, 46, 154, 162] Partial Parsing [6, 7, 8, 9, 11, 37, 43, 47, 48, 51, 52, 53, 57, 58, 112, 65, 69, 70, 71, 72, 73, 74, 75, 76, 88, 100, 101, 102, 103, 104, 107, 110, 113, 114, 120, 121, 127, 132, 133, 134, 140, 142, 145, 147, 149, 152, 163, 164, 165, 166, 169, 178, 182, 186, 190, 191, 192, 194, 195, 196, 197] ....
Frederick Mosteller and John W. Tukey. Data Analysis and Regression. Addison-Wesley Publishing Company, Reading MA, 1977.
....will result in further reduction of communication overhead. An extended study of this technique is presented elsewhere [21] The following section discusses the extension of the CDM for polynomial regression that has already been applied to large problems [15] 5 CDM and Regression Regression [29], like decision tree learning, is a popular data modeling technique. As we noted earlier in Section 3, naive application of standard regression technique may produce misleading and ambiguous results in heterogeneous, distributed environment. Earlier, we also saw a simple CDM regression example for ....
.... A, we can form the terms T k (x) k 2 A of the polynomial for each sample and apply the Wavelet packet transform to the samples representing each term and to the samples of f(x) Estimates of the local model 22 coecients, a k ; k 2 A , may be generated using standard regression techniques [29] directly on the Waveletpacket transforms since S 0 f(x) P k2 A a k S Tk and the S Tk are sparse, making them a nearly orthogonal basis for S f(x) Once the coecients of the local terms have been estimated, the coecients of the terms containing cross partition feature variables may be ....
Frederick Mosteller and John W. Tukey. Data Analysis and Regression. Addison-Wesley, Menlo Park, CA, 1977.
....guessing, and it will improve rapidly as real testing begins. Extrapolate using numbers from past projects. We recommend that anyone doing any sort of data mining read up on the statistical subfield of exploratory data analysis. Two entertaining and influential early books are [Tukey77] and [Mosteller77]. Reality check: you will certainly be wrong Even if you are careful with your extrapolation, you should realize that the number calculated is certainly wrong. Attaching too much certainty to the numbers and especially striving for meaningless precision will lead to madness. Concentrate on ....
Frederick Mosteller and John W. Tukey, Data Analysis and Regression, Addison-Wesley, 1977.
....samples will continue, and extrapolates to establish when (if at all) the current resource requirement will be broken. As illustrated in figure 6, the extrapolation consists of calculating the regression line of the buffer size B on time t by applying the least squares method of fitting a line [8]. 4 IMPLEMENTATION The audio delivery system is implemented in Regis [5] a development environment for component based, parallel and distributed programs. This section presents a more technical, implementation oriented description of the system and its dynamic protocol architecture [3] 4.1 ....
F. Mosteller and J. W. Tukey. Data Analysis and Regression. Addison-Wesley, 1977. 10 DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS II
....these two variables. We begin by observing that the relationship can be partitioned into two parts: a vertical partition at zero on the Effort axis and a separate, approximately linear partition. We call the gap in the distribution of Effort an indication, a suggestive characteristic of the data [16]. Examining other variables we find that the vertical partition corresponds to trials in which the outcome was Failure. Turning to the Success partition, we note that the correlation is positive, as expected. We fit a line to the Success partition, but note that there are two outliers from the ....
Frederick Mosteller and John W. Tukey. Data Analysis and Regression. Addison-Wesley Publishing Company, 1977.
.... 70 30 6 Equal Education Chatterjee and Price (1991) 176 70 3 Opportunity Gasoline Mileage Chatterjee and Price (1991) 261 30 10 Nuclear Power Cox and Snell (1982) 81 32 10 Crime Cox and Snell (1982) 170 47 13 Hald Draper and Smith (1981) 630 13 4 Grades Hamilton (1993) 83 118 3 Swiss Fertility Mosteller and Tukey (1977) 550 47 5 Surgical Unit Neter, Wasserman 439, 468 108 4 and Kutner (1990) Berkeley Study Weisberg (1985) Girls 56 32 10 Boys 57 26 10 Housing Weisberg (1985) 241 27 9 Highway Weisberg (1985) 206 39 13 B The Up Down Algorithm The search can proceed in two directions: Up from each starting model ....
Mosteller, F. and Tukey, J.W. (1977), Data Analysis and Regression, Reading, MA: Addison-- Wesley.
....l = 12 0:0 36 14 = 0 And for the schema 0 , X x2f0g f(x) j0 jOE(0 ) P (x) Pi l l = 12 0:64286 36 14 : 3 X x2f0g f(x) j0 jOE(0 ) P (x) Pi l l = 12 0:42857 36 14 : 2 Once again, we get the exactly same numbers as in Figure 6. 5 CDM and Regression Regression (Mosteller Tukey, 1977), like decision tree learning, is a form of supervised learning which fits within the CDM framework. A simple CDM regression example with discrete binary feature variables was h OE(h) OE(h) 0 0.64286 0.42857 1 0.85714 0.0 2 0.42857 0.64286 j w j w 0 j 0000 0.64286 0.35714 ....
....j S j HHS j GHS j GGS j HGS j H: HS j G: GS j . j j Gamma 1 j Gamma 2 0 level partitions 1 2 4 2 j . Figure 7: Application of quadrature filters in wavelet packet decomposition. a k ; k 2 Xi A , may be generated using standard regression techniques (Mosteller Tukey, 1977) directly on the wavelet packet transforms since S 0 f(x) P k2 Xi A a k S Tk and the S Tk are sparse, making them a nearly orthogonal basis for S f(x) Once the coefficients of the local terms have been estimated, the coefficients of the terms containing cross partition feature variables ....
Mosteller, F., & Tukey, J. W. (1977). Data analysis and regression. Menlo Park, CA: Addison-Wesley.
....to scale up. A survey of methods of scaling up inductive learning algorithms is presented in (Provost Venkateswarlu, 1998) 2. 3 Overview of Multivariate Regression Multivariate regression (MR) is a widely used data analysis technique owing to its ease of use and intuitive theoretical basis (Mosteller Tukey, 1977; Flury Riedwyl, 1988) MR involves fitting a parametric function model to a set of data. In this sense, it is a form of inductive supervised learning. The functions analyzed have the form f = b 1 r 1 b 2 r 2 : b k r k where the b i s are constant coefficients and the r i terms are ....
....the non zero wavelet coefficients in each partition without recourse to information exchange. In the general case, some information, in the form of wavelet coefficients, will need to be communicated among partitions in order to resolve non linear terms. 4 Multivariate Liner Regression Regression (Mosteller Tukey, 1977), is a form of supervised learning that is applicable to CDM. In this section one approach to distributed multivariate regression, based on an orthogonal wavelet basis, is presented. This section begins with a description of the method used to generate local models, followed by the method for ....
[Article contains additional citation context not shown here]
Mosteller, F., & Tukey, J. W. (1977). Data analysis and regression. Menlo Park, CA: Addison-Wesley.
No context found.
Mosteller, C. F., and Tukey, J. W. (1977). Data Analysis and Regression. AddisonWesley, Reading, Mass.
No context found.
Mosteller, C. F. and Tukey, J. W. (1977). Data Analysis and Regression. AddisonWesley, Reading, Mass.
No context found.
Mosteller, C. F., and Tukey, J. W. (1977). Data Analysis and Regression. AddisonWesley, Reading, Mass.
No context found.
Mosteller, F. and J. Tukey (1977) `Data analysis and regression', Addison-Wesley, Reading, MA.
No context found.
Mosteller, F. and Tukey, J.W. (1977), Data Analysis and Regression, Reading: Addison-Wesley.
No context found.
Mosteller, F. and Tukey, J. W. (1972). Data Analysis and Regression. Addison Wesley, Reading Mass.
No context found.
Mosteller, C. F. and Tukey, J. W. (1977). Data Analysis and Regression. AddisonWesley, Reading, Mass.
No context found.
Statist., 1, 301-328. Mosteller, F., and Tukey, J. W. (1977), Data Analysis and Regression, Reading, MA: AddisonWesley.
No context found.
F. Mosteller and J.W. Tukey. Data Analysis and Regression. Wiley, Reading, Massachusetts, 1977.
No context found.
Mosteller, F. and Tukey, J. W. Data Analysis and Regression. Addison Wesley, Reading Mass.
No context found.
Mosteller, C. F., and Tukey, J. W. (1977). Data Analysis and Regression. AddisonWesley, Reading, Mass.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC