Results 1 - 10
of
558
Statistical mechanics of complex networks
- Rev. Mod. Phys
"... Complex networks describe a wide range of systems in nature and society, much quoted examples including the cell, a network of chemicals linked by chemical reactions, or the Internet, a network of routers and computers connected by physical links. While traditionally these systems were modeled as ra ..."
Abstract
-
Cited by 807 (7 self)
- Add to MetaCart
Complex networks describe a wide range of systems in nature and society, much quoted examples including the cell, a network of chemicals linked by chemical reactions, or the Internet, a network of routers and computers connected by physical links. While traditionally these systems were modeled as random graphs, it is increasingly recognized that the topology and evolution of real
Generating Representative Web Workloads for Network and Server Performance Evaluation
, 1997
"... One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web worklo ..."
Abstract
-
Cited by 681 (8 self)
- Add to MetaCart
One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server. The tool, called Surge (Scalable URL Reference Generator) generates references matching empirical measurements of 1) server file size distribution; 2) request size distribution; 3) relative file popularity; 4) embedded file references; 5) temporal locality of reference; and 6) idle periods of individual users. This paper reviews the essential elements required in the generation of a representative Web workload. It also addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream, the solutions we adopted, and their associated accuracy. Finally, we present evidence that Surge exercises servers in a manner significantly different from other Web server benchmarks.
The Vocabulary Problem in Human-System Communication
- COMMUNICATIONS OF THE ACM
, 1987
"... In almost all computer applications, users must enter correct words for the desired objects or actions. For success without extensive training, or in first-tries for new targets, the system must recognize terms that will be chosen spontaneously. We studied spontaneous word choice for objects in five ..."
Abstract
-
Cited by 353 (6 self)
- Add to MetaCart
In almost all computer applications, users must enter correct words for the desired objects or actions. For success without extensive training, or in first-tries for new targets, the system must recognize terms that will be chosen spontaneously. We studied spontaneous word choice for objects in five application-related domains, and found the variability to be surprisingly large. In every case two people favored the same term with probability <0.20. Simulations show how this fundamental property of language limits the success of various design methodologies for vocabulary-driven interaction. For example, the popular approach in which access is via one designer's favorite single word will result in 80-90 percent failure rates in many common situations. An optimal strategy, unlimited aliasing, is derived and shown to be capable of several-fold improvements.
Efficient Filtering of XML Documents for Selective Dissemination of Information
, 2000
"... Information Dissemination applications are gaining increasing popularity due to dramatic improvements in communications bandwidth and ubiquity. The sheer volume of data available necessitates the use of selective approaches to dissemination in order to avoid overwhelming users with unnecessaryi ..."
Abstract
-
Cited by 272 (13 self)
- Add to MetaCart
Information Dissemination applications are gaining increasing popularity due to dramatic improvements in communications bandwidth and ubiquity. The sheer volume of data available necessitates the use of selective approaches to dissemination in order to avoid overwhelming users with unnecessaryinformation. Existing mechanisms for selective dissemination typically rely on simple keyword matching or "bag of words" information retrieval techniques. The advent of XML as a standard for information exchangeand the development of query languages for XML data enables the development of more sophisticated filtering mechanisms that take structure information into account. We have developed several index organizations and search algorithms for performing efficient filtering of XML documents for large-scale information dissemination systems. In this paper we describe these techniques and examine their performance across a range of document, workload, and scale scenarios. 1
The Web as a graph: measurements, models, and methods
, 1999
"... . The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, ..."
Abstract
-
Cited by 257 (10 self)
- Add to MetaCart
. The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, sociological, and commercial --- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web. 1 Overview Few events in the history of comput...
An Empirical Model of HTTP Network Traffic
, 1997
"... The workload of the global Internet is dominated by the Hypertext Transfer Protocol (HTTP), an application protocol used by World Wide Web clients and servers. Simulation studies of this environment will require a model of the traffic patterns of the World Wide Web, in order to investigate the perfo ..."
Abstract
-
Cited by 210 (1 self)
- Add to MetaCart
The workload of the global Internet is dominated by the Hypertext Transfer Protocol (HTTP), an application protocol used by World Wide Web clients and servers. Simulation studies of this environment will require a model of the traffic patterns of the World Wide Web, in order to investigate the performance aspects of this increasingly popular application. We have developed an empirical model of network traffic produced by HTTP. Instead of relying on server or client logs, our approach is based on gathering packet traces of HTTP network conversations. Through traffic analysis, we have determined statistics and distributions for higher-level quantities such as the size of HTTP items retrieved, the number of items per "Web page", think time, and user browsing behavior. These quantities form a model can then be used by simulations to mimic World Wide Web network applications in wide-area IP internetworks. Keywords: World Wide Web, HTTP, traffic model, traffic measurements, workload, Interne...
N-grambased text categorization
- In Proc. of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval
, 1994
"... Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in ..."
Abstract
-
Cited by 202 (0 self)
- Add to MetaCart
Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We describe here an N-gram-based approach to text categorization that is tolerant of textual errors. The system is small, fast and robust. This system worked very well for language classification, achieving in one test a 99.8 % correct classification rate on Usenet newsgroup articles written in different languages. The system also worked reasonably well for classifying articles from a number of different computer-oriented newsgroups according to subject, achieving as high as an 80 % correct classification rate. There are also several obvious directions for improving the system’s classification performance in those cases where it did not do as well. The system is based on calculating and comparing profiles of N-gram frequencies. First, we use the system to compute profiles on training set data that represent the various categories, e.g., language samples or newsgroup content samples. Then the system computes a profile for a particular document that is to be classified. Finally, the system computes a distance measure between the document’s profile and each of the
Evolution of networks
- Adv. Phys
, 2002
"... We review the recent fast progress in statistical physics of evolving networks. Interest has focused mainly on the structural properties of random complex networks in communications, biology, social sciences and economics. A number of giant artificial networks of such a kind came into existence rece ..."
Abstract
-
Cited by 201 (1 self)
- Add to MetaCart
We review the recent fast progress in statistical physics of evolving networks. Interest has focused mainly on the structural properties of random complex networks in communications, biology, social sciences and economics. A number of giant artificial networks of such a kind came into existence recently. This opens a wide field for the study of their topology, evolution, and complex processes occurring in them. Such networks possess a rich set of scaling properties. A number of them are scale-free and show striking resilience against random breakdowns. In spite of large sizes of these networks, the distances between most their vertices are short — a feature known as the “smallworld” effect. We discuss how growing networks self-organize into scale-free structures and the role of the mechanism of preferential linking. We consider the topological and structural properties of evolving networks, and percolation in these networks. We present a number of models demonstrating the main features of evolving networks and discuss current approaches for their simulation and analytical study. Applications of the general results to particular networks in Nature are discussed. We demonstrate the generic connections of the network growth processes with the general problems
A Brief History of Generative Models for Power Law and Lognormal Distributions
- INTERNET MATHEMATICS
"... Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a a lognormal distribution. In trying ..."
Abstract
-
Cited by 192 (7 self)
- Add to MetaCart
Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a a lognormal distribution. In trying

