Results 1  10
of
4,406
The WEKA Data Mining Software: An Update
"... More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an a ..."
Abstract

Cited by 1756 (15 self)
 Add to MetaCart
(Show Context)
More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on SourceForge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003. 1.
LowCost Traffic Analysis Of Tor
 In Proceedings of the 2005 IEEE Symposium on Security and Privacy. IEEE CS
, 2005
"... Tor is the second generation Onion Router, supporting the anonymous transport of TCP streams over the Internet. Its low latency makes it very suitable for common tasks, such as web browsing, but insecure against trafficanalysis attacks by a global passive adversary. We present new trafficanalysis t ..."
Abstract

Cited by 231 (8 self)
 Add to MetaCart
(Show Context)
Tor is the second generation Onion Router, supporting the anonymous transport of TCP streams over the Internet. Its low latency makes it very suitable for common tasks, such as web browsing, but insecure against trafficanalysis attacks by a global passive adversary. We present new trafficanalysis techniques that allow adversaries with only a partial view of the network to infer which nodes are being used to relay the anonymous streams and therefore greatly reduce the anonymity provided by Tor. Furthermore, we show that otherwise unrelated streams can be linked back to the same initiator. Our attack is feasible for the adversary anticipated by the Tor designers. Our theoretical attacks are backed up by experiments performed on the deployed, albeit experimental, Tor network. Our techniques should also be applicable to any low latency anonymous network. These attacks highlight the relationship between the field of trafficanalysis and more traditional computer security issues, such as covert channel analysis. Our research also highlights that the inability to directly observe network links does not prevent an attacker from performing trafficanalysis: the adversary can use the anonymising network as an oracle to infer the traffic load on remote nodes in order to perform trafficanalysis. 1
KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems ⋆
"... be inserted by the editor) ..."
(Show Context)
Contraction hierarchies: Faster and simpler . . .
, 2008
"... We present a route planning technique solely based on the concept of node contraction. We contract or remove one node at a time out of the graph and add shortcut edges to the remaining graph to preserve shortest paths distances. The resulting contraction hierarchy (CH), the original graph plus short ..."
Abstract

Cited by 120 (34 self)
 Add to MetaCart
We present a route planning technique solely based on the concept of node contraction. We contract or remove one node at a time out of the graph and add shortcut edges to the remaining graph to preserve shortest paths distances. The resulting contraction hierarchy (CH), the original graph plus shortcuts, also defines an order of “importance ” among all nodes through the node selection. We apply a modified bidirectional Dĳkstra algorithm that takes advantage of this node order to obtain shortest paths. The search space is reduced by relaxing only edges leading to more important nodes in the forward search and edges coming from more important nodes in the backward search. Both search scopes eventually meet at the most important node on a shortest path. We use a simple but extensible heuristic to obtain the node order: a priority queue whose priority function for each node is a linear combination of several terms, e.g. one term weights nodes depending on the sparsity of the remaining graph after the contraction. Another term regards the already contracted nodes to allow a more uniform contraction. Depending on the application we can select the combination of the priority terms to obtain the required hierarchy.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation
, 2007
"... Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student’s paired ttest, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fisher’s randomization (permutation) test as nonparametr ..."
Abstract

Cited by 116 (10 self)
 Add to MetaCart
(Show Context)
Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student’s paired ttest, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fisher’s randomization (permutation) test as nonparametric significance tests for IR but these tests have seen little use. For each of these five tests, we took the adhoc retrieval runs submitted to TRECs 3 and 58, and for each pair of runs, we measured the statistical significance of the difference in their mean average precision. We discovered that there is little practical difference between the randomization, bootstrap, and t tests. Both the Wilcoxon and sign test have a poor ability to detect significance and have the potential to lead to false detections of significance. The Wilcoxon and sign tests are simplified variants of the randomization test and their use should be discontinued for measuring the significance of a difference between means.
Strategies for Sound Internet Measurement
 IMC'04
, 2004
"... Conducting an Internet measurement study in a sound fashion can be much more difficult than it might first appear. We present a number of strategies drawn from experiences for avoiding or overcoming some of the pitfalls. In particular, we discuss dealing with errors and inaccuracies; the importance ..."
Abstract

Cited by 95 (3 self)
 Add to MetaCart
Conducting an Internet measurement study in a sound fashion can be much more difficult than it might first appear. We present a number of strategies drawn from experiences for avoiding or overcoming some of the pitfalls. In particular, we discuss dealing with errors and inaccuracies; the importance of associating metadata with measurements; the technique of calibrating measurements by examining outliers and testing for consistencies; difficulties that arise with largescale measurements; the utility of developing a discipline for reliably reproducing analysis results; and issues with making datasets publicly available. We conclude with thoughts on the sorts of tools and community practices that can assist researchers with conducting sound measurement studies.
Improved base calling for the Illumina Genome Analyzer using
, 2009
"... Software ..."
(Show Context)
VariationAware Application Scheduling and Power Management for Chip Multiprocessors
, 2008
"... Withindie process variation causes individual cores in a Chip Multiprocessor (CMP) to differ substantially in both static power consumed and maximum frequency supported. In this environment, ignoring variation effects when scheduling applications or when managing power with Dynamic Voltage and Freq ..."
Abstract

Cited by 72 (6 self)
 Add to MetaCart
Withindie process variation causes individual cores in a Chip Multiprocessor (CMP) to differ substantially in both static power consumed and maximum frequency supported. In this environment, ignoring variation effects when scheduling applications or when managing power with Dynamic Voltage and Frequency Scaling (DVFS) is suboptimal. This paper proposes variationaware algorithms for application scheduling and power management. One such power management algorithm, called LinOpt, uses linear programming to find the best voltage and frequency levels for each of the cores in the CMP — maximizing throughput at a given power budget. In a 20core CMP, the combination of variationaware application scheduling and LinOpt increases the average throughput by 12–17 % and reduces the average ED 2 by 30–38 % — all relative to using variationaware scheduling together with a simple extension to Intel’s Foxton power management algorithm.
Engineering Highway Hierarchies
, 2006
"... Highway hierarchies exploit hierarchical properties inherent in realworld road networks to allow fast and exact pointtopoint shortestpath queries. A fast preprocessing routine iteratively performs two steps: first, it removes edges that only appear on shortest paths close to source or target; s ..."
Abstract

Cited by 69 (6 self)
 Add to MetaCart
Highway hierarchies exploit hierarchical properties inherent in realworld road networks to allow fast and exact pointtopoint shortestpath queries. A fast preprocessing routine iteratively performs two steps: first, it removes edges that only appear on shortest paths close to source or target; second, it identifies lowdegree nodes and bypasses them by introducing shortcut edges. The resulting hierarchy of highway networks is then used in a Dijkstralike bidirectional query algorithm to considerably reduce the search space size without losing exactness. The crucial fact is that ‘far away ’ from source and target it is sufficient to consider only highlevel edges. Various experiments with realworld road networks confirm the performance of our approach. On a 2.0 GHz machine, preprocessing the network of Western Europe, which consists of about 18 million nodes, takes 13 minutes and yields 48 bytes of additional data per node. Then, random queries take 0.61 ms on average. If we are willing to accept slower query times (1.10 ms), the memory usage can be decreased to 17 bytes per node. We can guarantee that at most 0.014 % of all nodes are visited during any query. Results for US road networks are similar. Highway hierarchies can be combined with goaldirected search, they can be extended to answer manytomany queries, and they are a crucial ingredient for other speedup techniques, namely for transitnode routing and highwaynode routing.