Results 1 - 10
of
1,940
The structure and function of complex networks
- SIAM REVIEW
, 2003
"... Inspired by empirical studies of networked systems such as the Internet, social networks, and biological networks, researchers have in recent years developed a variety of techniques and models to help us understand or predict the behavior of these systems. Here we review developments in this field, ..."
Abstract
-
Cited by 2600 (7 self)
- Add to MetaCart
(Show Context)
Inspired by empirical studies of networked systems such as the Internet, social networks, and biological networks, researchers have in recent years developed a variety of techniques and models to help us understand or predict the behavior of these systems. Here we review developments in this field, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.
Automatically characterizing large scale program behavior
, 2002
"... Understanding program behavior is at the foundation of computer architecture and program optimization. Many pro-grams have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and com-pile ..."
Abstract
-
Cited by 778 (41 self)
- Add to MetaCart
(Show Context)
Understanding program behavior is at the foundation of computer architecture and program optimization. Many pro-grams have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and com-piler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we.must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution. Our goal is to develop automatic techniques that are ca-pable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behav-ior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural met-rics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of an-alyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research. 1.
Survey of clustering algorithms
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2005
"... Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the ..."
Abstract
-
Cited by 499 (4 self)
- Add to MetaCart
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
Image denoising by sparse 3D transform-domain collaborative filtering
- IEEE TRANS. IMAGE PROCESS
, 2007
"... We propose a novel image denoising strategy based on an enhanced sparse representation in transform domain. The enhancement of the sparsity is achieved by grouping similar 2-D image fragments (e.g., blocks) into 3-D data arrays which we call “groups.” Collaborative filtering is a special procedure d ..."
Abstract
-
Cited by 424 (32 self)
- Add to MetaCart
We propose a novel image denoising strategy based on an enhanced sparse representation in transform domain. The enhancement of the sparsity is achieved by grouping similar 2-D image fragments (e.g., blocks) into 3-D data arrays which we call “groups.” Collaborative filtering is a special procedure developed to deal with these 3-D groups. We realize it using the three successive steps: 3-D transformation of a group, shrinkage of the transform spectrum, and inverse 3-D transformation. The result is a 3-D estimate that consists of the jointly filtered grouped image blocks. By attenuating the noise, the collaborative filtering reveals even the finest details shared by grouped blocks and, at the same time, it preserves the essential unique features of each individual block. The filtered blocks are then returned to their original positions. Because these blocks are overlapping, for each pixel, we obtain many different estimates which need to be combined. Aggregation is a particular averaging procedure which is exploited to take advantage of this redundancy. A significant improvement is obtained by a specially developed collaborative Wiener filtering. An algorithm based on this novel denoising strategy and its efficient implementation are presented in full detail; an extension to color-image denoising is also developed. The experimental results demonstrate that this computationally scalable algorithm achieves state-of-the-art denoising performance in terms of both peak signal-to-noise ratio and subjective visual quality.
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract
-
Cited by 408 (0 self)
- Add to MetaCart
(Show Context)
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
From frequency to meaning : Vector space models of semantics
- Journal of Artificial Intelligence Research
, 2010
"... Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are begi ..."
Abstract
-
Cited by 347 (3 self)
- Add to MetaCart
(Show Context)
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field. 1.
Discovering Word Senses from Text.
- In Proceedings of the 8th ACM Conference on Knowledge Discovery and Data Mining (KDD-02),
, 2002
"... ..."
(Show Context)
Robust Data Clustering
, 2003
"... We address the problem of robust clustering by combining data partitions (forming a clustering ensemble) produced by multiple clusterings. We formulate robust clustering under an information-theoretical framework; mutual information is the underlying concept used in the definition of quantitative me ..."
Abstract
-
Cited by 273 (8 self)
- Add to MetaCart
We address the problem of robust clustering by combining data partitions (forming a clustering ensemble) produced by multiple clusterings. We formulate robust clustering under an information-theoretical framework; mutual information is the underlying concept used in the definition of quantitative measures of agreement or consistency between data partitions. Robustness is assessed by variance of the cluster membership, based on bootstrapping. We propose and analyze a voting mechanism on pairwise associations of patterns for combining data partitions. We show that the proposed technique attempts to optimize the mutual information based criteria, although the optimality is not ensured in all situations. This evidence accumulation method is demonstrated by combining the well-known Kmeans algorithm to produce clustering ensembles. Experimental results show the ability of the technique to identify clusters with arbitrary shapes and sizes.
Evaluation of Hierarchical Clustering Algorithms for Document Datasets
- Data Mining and Knowledge Discovery
, 2002
"... Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering solutions provide a view of the data at ..."
Abstract
-
Cited by 261 (6 self)
- Add to MetaCart
Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering solutions provide a view of the data at different levels of granularity, making them ideal for people to visualize and interactively explore large document collections.
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract
-
Cited by 246 (14 self)
- Add to MetaCart
(Show Context)
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse real-world networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large real-world networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “community-like.” This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.