Results 1  10
of
5,291,260
The Nature of Statistical Learning Theory
, 1999
"... Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on the deve ..."
Abstract

Cited by 12976 (32 self)
 Add to MetaCart
Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based
Maximum likelihood from incomplete data via the EM algorithm
 JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B
, 1977
"... A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situat ..."
Abstract

Cited by 11807 (17 self)
 Add to MetaCart
situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.
A comparison of approaches to largescale data analysis
 In SIGMOD ’09: Proceedings of the 35th SIGMOD international conference on Management of data
, 2009
"... There is currently considerable enthusiasm around the MapReduce (MR) paradigm for largescale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model ..."
Abstract

Cited by 246 (7 self)
 Add to MetaCart
There is currently considerable enthusiasm around the MapReduce (MR) paradigm for largescale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing
LargeScale Data Analysis Using Heuristic Methods
, 2010
"... Abstract. Estimation and modelling problems as they arise in many data analysis areas often turn out to be unstable and/or intractable by standard numerical methods. Such problems frequently occur in fitting of large data sets to a certain model and in predictive learning. Heuristics are general re ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
to be still limited. This paper surveys a set of problemsolving strategies, guided by heuristic information, that are expected to be used more frequently. The use of recent advances in different fields of largescale data analysis is promoted focusing on applications in medicine, biology and technology.
Algorithmic and Statistical Perspectives on LargeScale Data Analysis
 Combinatorial Scientific Computing,” Chapman and Hall/CRC
, 2011
"... ar ..."
Topics in Highdimensional and Largescale Data Analysis
"... This thesis concerns the analysis of highdimensional and largescale data that have become ubiquitous in today’s informationdriven age. It consists of four main chapters. The first studies the problem of variable selection, where out of potentially thousands of measured variables, one wishes to s ..."
Abstract
 Add to MetaCart
This thesis concerns the analysis of highdimensional and largescale data that have become ubiquitous in today’s informationdriven age. It consists of four main chapters. The first studies the problem of variable selection, where out of potentially thousands of measured variables, one wishes
Award Algorithms for LargeScale Data Analysis.
"... Abstract In a classical online network design problem, traffic requirements are gradually revealed to an algorithm. Each time a new request arrives, the algorithm has to satisfy it by augmenting the network under construction in a proper way (with no possibility of recovery). In this paper we study ..."
Abstract
 Add to MetaCart
Abstract In a classical online network design problem, traffic requirements are gradually revealed to an algorithm. Each time a new request arrives, the algorithm has to satisfy it by augmenting the network under construction in a proper way (with no possibility of recovery). In this paper we study a natural generalization of online network design problems, where a fraction of the requests (the outliers) can be disregarded. Now, each time a request arrives, the algorithm first decides whether to satisfy it or not, and only in the first case it acts accordingly. We cast three classical network design problems into this framework: • Online Steiner Tree with Outliers. In this case a set of t terminals that belong to an nnode graph is presented, one at a time, to an algorithm. Each time a new terminal arrives, the algorithm can either discard or select it. In the latter case, the algorithm connects it to the Steiner tree under construction (initially consisting of a given root node). At the end of the process, at least k terminals must be selected. • Online TSP with Outliers. This is the same problem as above, but with the Steiner tree replaced by a TSP tour. • Online Facility Location with Outliers. In this case, we are also given a set of facility nodes, each one with an opening cost. Each time a terminal is selected, we have to connect it to some facility (and open that facility, if it is not already open).
Learning Hierarchical Bayesian Networks for LargeScale Data Analysis
"... Abstract. Bayesian network learning is a useful tool for exploratory data analysis. However, applying Bayesian networks to the analysis of largescale data, consisting of thousands of attributes, is not straightforward because of the heavy computational burden in learning and visualization. In this ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract. Bayesian network learning is a useful tool for exploratory data analysis. However, applying Bayesian networks to the analysis of largescale data, consisting of thousands of attributes, is not straightforward because of the heavy computational burden in learning and visualization
Parallel index and query for large scale data analysis
 In SC11
, 2011
"... Modern scientific datasets present numerous data management and analysis challenges. Stateoftheart index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for processing general scientific ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
Modern scientific datasets present numerous data management and analysis challenges. Stateoftheart index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for processing general scientific
Parallelisation of Sparse Grids for Large Scale Data Analysis
 In Proceedings of the ICCS, Lecture Notes in Computer Science
, 2003
"... Sparse Grids (SG), due to Zenger, are the basis for efficient high dimensional approximation and have recently been applied successfully to predictive modelling. They are spanned by a collection of simpler function spaces represented by regular grids. The combination technique prescribes how approxi ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Sparse Grids (SG), due to Zenger, are the basis for efficient high dimensional approximation and have recently been applied successfully to predictive modelling. They are spanned by a collection of simpler function spaces represented by regular grids. The combination technique prescribes how approximations on simple grids can be combined to approximate the high dimensional functions. It can be improved by iterative refinement.
Results 1  10
of
5,291,260