Results 1 - 10
of
52,101
Knowledge acquisition via incremental conceptual clustering
- Machine Learning
, 1987
"... hill climbing Abstract. Conceptual clustering is an important way of summarizing and explaining data. However, the recent formulation of this paradigm has allowed little exploration of conceptual clustering as a means of improving performance. Furthermore, previous work in conceptual clustering has ..."
Abstract
-
Cited by 765 (9 self)
- Add to MetaCart
not explicitly dealt with constraints imposed by real world environments. This article presents COBWEB, a conceptual clustering system that organizes data so as to maximize inference ability. Additionally, COBWEB is incremental and computationally economical, and thus can be flexibly applied in a variety
A lattice conceptual clustering system and its application to browsing retrieval
- Machine Learning
, 1996
"... Abstract. The theory of concept (or Galois) lattices provides a simple and formal approach to conceptual clustering. In this paper we present GALOIS, a system that automates and applies this theory. The algorithm utilized by GALOIS to build a concept lattice is incremental and efficient, each update ..."
Abstract
-
Cited by 92 (8 self)
- Add to MetaCart
Abstract. The theory of concept (or Galois) lattices provides a simple and formal approach to conceptual clustering. In this paper we present GALOIS, a system that automates and applies this theory. The algorithm utilized by GALOIS to build a concept lattice is incremental and efficient, each
Knowledge management and knowledge management systems: Conceptual foundations and an agenda . . .
, 1998
"... ..."
GPFS: A Shared-Disk File System for Large Computing Clusters
- In Proceedings of the 2002 Conference on File and Storage Technologies (FAST
, 2002
"... GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community ove ..."
Abstract
-
Cited by 521 (3 self)
- Add to MetaCart
GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community
MapReduce: Simplified data processing on large clusters.
- In Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI-04),
, 2004
"... Abstract MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of ..."
Abstract
-
Cited by 3439 (3 self)
- Add to MetaCart
Abstract MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details
Cluster analysis and display of genome-wide expression patterns’,
- Proc. Natl. Acad.
, 1998
"... ABSTRACT A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and th ..."
Abstract
-
Cited by 2895 (44 self)
- Add to MetaCart
ABSTRACT A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering
Clustering by passing messages between data points
- Science
, 2007
"... Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such “exemplars ” can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initi ..."
Abstract
-
Cited by 696 (8 self)
- Add to MetaCart
so in less than one-hundredth the amount of time. Clustering data based on a measure of similarity is a critical step in scientific data analysis and in engineering systems. A common approach is to use data to learn a set of centers such that the sum of
The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations
- Journal of Personality and Social Psychology
, 1986
"... In this article, we attempt to distinguish between the properties of moderator and mediator variables at a number of levels. First, we seek to make theorists and researchers aware of the importance of not using the terms moderator and mediator interchangeably by carefully elaborating, both conceptua ..."
Abstract
-
Cited by 5736 (8 self)
- Add to MetaCart
conceptually and strategically, the many ways in which moderators and mediators differ. We then go beyond this largely pedagogical function and delineate the conceptual and strategic implications of making use of such distinctions with regard to a wide range of phenomena, including control and stress
Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results
, 1996
"... We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results. We systematically evaluate Scatter/Gather in this context and find significant improvements over similarity search ranking alone. This resul ..."
Abstract
-
Cited by 480 (5 self)
- Add to MetaCart
. This result provides evidence validating the cluster hypothesis which states that relevant documents tend to be more similar to each other than to non-relevant documents. We describe a system employing Scatter/Gather and demonstrate that users are able to use this system close to its full potential. 1
The Google File System
- ACM SIGOPS OPERATING SYSTEMS REVIEW
, 2003
"... We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While s ..."
Abstract
-
Cited by 1501 (3 self)
- Add to MetaCart
data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications
Results 1 - 10
of
52,101