Results 1  10
of
49
Estimating a Collective Household Model with Survey Data on Financial Satisfaction', IFS working paper 06/19
 Social Indicators Research
, 2006
"... Requests for further information may be addressed to: ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Requests for further information may be addressed to:
Using Loglinear Models to Compress Datacubes
 In WAIM ’00: Proceedings of the First International Conference on WebAge Information Management
, 1999
"... A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains in each cell an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. In practical situations, cubes can require a large amount of storage, so, ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains in each cell an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. In practical situations, cubes can require a large amount of storage, so, compressing them is of practical importance. In this paper, we propose an approximation technique that reduces the storage cost of the cube at the price of getting approximate answers for the queries posed against the cube. The idea is to characterize regions of the cube by using statistical models whose description take less space than the data itself. Then, the model parameters can be used to estimate the cube cells with a certain level of accuracy. To increase the accuracy, some of the "outliers," i.e., cells that incur in the largest errors when estimated are retained. The storage taken by the model parameters and the retained cells, of course, should take a fraction of the space of the...
SPIDER Retrieval System at TREC5
 In Proc. of TREC5
, 1996
"... The ETH group participated in this year`s TREC in the following tracks: automatic adhoc (long and short), the manual adhoc, routing, and confusion. We also did some experiments on the chinese data which were not submitted. While for adhoc we relied mainly on methods which were well evaluated in prev ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
The ETH group participated in this year`s TREC in the following tracks: automatic adhoc (long and short), the manual adhoc, routing, and confusion. We also did some experiments on the chinese data which were not submitted. While for adhoc we relied mainly on methods which were well evaluated in previous TRECs, we successfully tried completely new techniques for the routing task and the confusion task: for routing we found an optimal feature selection method and included cooccurrence data into the retrieval function; for confusion we applied a robust probabilistic technique for estimating feature frequencies.
H.: Mining lesiondeficit associations in a brain image database
 ACM SIGKDD
, 1999
"... We present a data mining process for discovering associations between structures and functions of the human brain. Our approach is through the study of lesioned (abnormal) structures and associated functional deficits (disorders). For this purpose we have developed a BRAinImage Database (BRAID) tha ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
We present a data mining process for discovering associations between structures and functions of the human brain. Our approach is through the study of lesioned (abnormal) structures and associated functional deficits (disorders). For this purpose we have developed a BRAinImage Database (BRAID) that integrates image processing and visualization capabilities with statistical analysis of spatial and clinical data, providing access via extended SQL through a web interface. We present visualization and statistical methods for mining lesiondeficit associations. We consider issues of scalability and morphological variability. We demonstrate the use of the proposed mining methods by applying them to epidemiological data finding clinically meaningful associations. 1
Rainfall modelling using a latent Gaussian variable
 In Modelling Longitudinal and Spatially Correlated Data: Methods, Applications, and Future Directions
, 1997
"... ABSTRACT A monotonic transformation is applied to hourly rainfall data to achieve marginal normality. This de nes a latent Gaussian variable, with zero rainfall corresponding to censored values below a threshold. Autocorrelations of the latent variable are estimated by maximum likelihood. The goodne ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
ABSTRACT A monotonic transformation is applied to hourly rainfall data to achieve marginal normality. This de nes a latent Gaussian variable, with zero rainfall corresponding to censored values below a threshold. Autocorrelations of the latent variable are estimated by maximum likelihood. The goodness of t of the model to Edinburgh rainfall data is comparable with that of existing point process models. Gibbs sampling is used to disaggregate daily rainfall data, to generate typical hourly data conditional on daily totals. Key words and phrases: Gibbs sampling, normalising transformation, rainfall disaggregation, time series. 1
Workload Characterization on a Production Hadoop Cluster: A Case Study on Taobao
"... Abstract—MapReduce is becoming the stateoftheart computing paradigm for processing largescale datasets on a large cluster with tens or thousands of nodes. It has been widely used in various fields such as ecommerce, Web search, social networks, and scientific computation. Understanding the char ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Abstract—MapReduce is becoming the stateoftheart computing paradigm for processing largescale datasets on a large cluster with tens or thousands of nodes. It has been widely used in various fields such as ecommerce, Web search, social networks, and scientific computation. Understanding the characteristics of MapReduce workloads is the key to achieving better configuration decisions and improving the system throughput. However, workload characterization of MapReduce, especially in a largescale production environment, has not been well studied yet. To gain insight on MapReduce workloads, we collected a twoweek workload trace from a 2,000node Hadoop cluster at Taobao, which is the biggest online ecommerce enterprise in Asia, ranked 14
Residual–Based Shadings for Visualizing (Conditional) Independence
, 2005
"... Residualbased shadings for enhancing mosaic and association plots to visualize independence models for contingency tables are extended in two directions: (a) perceptually uniform HueChromaLuminance (HCL) colors are used and (b) the result of an associated significance test is coded by the appear ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Residualbased shadings for enhancing mosaic and association plots to visualize independence models for contingency tables are extended in two directions: (a) perceptually uniform HueChromaLuminance (HCL) colors are used and (b) the result of an associated significance test is coded by the appearance of color in the visualization. For obtaining (a), a general strategy for deriving diverging palettes in the perceptuallybased HCL space is suggested. As for (b), cut offs that control the appearance of color are computed in a datadriven way based on the conditional permutation distribution of maximumtype test statistics. The shadings are first established for the case of independence in 2way tables and then extended to more general independence models for multiway tables, including in particular conditional independence models.
Data mining in brain imaging
 Statistical Methods in Medical Research
, 2000
"... Data mining in brain imaging is proving to be an effective methodology for disease prognosis and prevention. This, together with the rapid accumulation of massive heterogeneous data sets, motivates the need for efficient methods that filter, clarify, assess, correlate and cluster brainrelated infor ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Data mining in brain imaging is proving to be an effective methodology for disease prognosis and prevention. This, together with the rapid accumulation of massive heterogeneous data sets, motivates the need for efficient methods that filter, clarify, assess, correlate and cluster brainrelated information. Here, we present data mining methods that have been or could be employed in the analysis of brain images. These methods address two types of brain imaging data: structural and functional. We introduce statistical methods that aid the discovery of interesting associations and patterns between brain images and other clinical data. We consider several applications of these methods, such as the analysis of taskactivation, lesiondeficit, and structure morphological variability; the development of probabilistic atlases; and tumour analysis. We include examples of applications to real brain data. Several data mining issues, such as that of method validation or verification, are also discussed. 1
Boolean Regression
 Annals of Operations Research
, 1994
"... We take a regressionbased approach to the problem of induction, which is the problem of inferring general rules from specific instances. Whereas traditional regression analysis fits a numerical formula to data, we fit a logical formula to boolean data. We can, for instance, construct an expert ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
We take a regressionbased approach to the problem of induction, which is the problem of inferring general rules from specific instances. Whereas traditional regression analysis fits a numerical formula to data, we fit a logical formula to boolean data. We can, for instance, construct an expert system by fitting rules to an expert's observed behavior. A regressionbased approach has the advantage of providing tests of statistical significance as well as other tools of regression analysis. Our approach can be extended to nonboolean discrete data, and we argue that it is better suited to rule construction than logit and other types of categorical data analysis. We find maximum likelihood and bayesian estimates of a bestfitting boolean function or formula and show that bayesian estimates are more appropriate. We also derive confidence and significance levels. We show that finding the bestfitting logical formula is a pseudoboolean optimization problem, and finding the best...