Results 1 - 10
of
47,388
Input Data
"... To obtain reliable variant results, the accuracy of sequence alignment, consensus calling and variant detection is of paramount importance. Throughout its history, DNASTAR has emphasized the development of exceptionally accurate software, ensuring that users will obtain the highest quality results. ..."
Abstract
- Add to MetaCart
. To assess the accuracy of DNASTAR’s next-generation sequence aligner and variant caller for Illumina and Ion Torrent data, we compared whole exome results from DNASTAR’s SeqMan NGen 12.2 with those from CLC Bio’s Genomics Workbench 8.0, another commercial pipeline with a variant detection workflow. Our
Input data:
"... - daily precipitation amounts measured at 78 stations covering the Czech Republic (area of 78 864 square km, with complex orography; Fig. 1), with altitudes from 158 to 1324 m a.s.l. The data cover the period of 1961-2000; there are no missing values in this dataset. Extreme precipitation events:- m ..."
Abstract
- Add to MetaCart
- daily precipitation amounts measured at 78 stations covering the Czech Republic (area of 78 864 square km, with complex orography; Fig. 1), with altitudes from 158 to 1324 m a.s.l. The data cover the period of 1961-2000; there are no missing values in this dataset. Extreme precipitation events
Data Streams: Algorithms and Applications
, 2005
"... In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerg ..."
Abstract
-
Cited by 533 (22 self)
- Add to MetaCart
In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has
MapReduce: Simplified data processing on large clusters.
- In Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI-04),
, 2004
"... Abstract MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of ..."
Abstract
-
Cited by 3439 (3 self)
- Add to MetaCart
of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large
Fuzzy extractors: How to generate strong keys from biometrics and other noisy data
, 2008
"... We provide formal definitions and efficient secure techniques for • turning noisy information into keys usable for any cryptographic application, and, in particular, • reliably and securely authenticating biometric data. Our techniques apply not just to biometric information, but to any keying mater ..."
Abstract
-
Cited by 535 (38 self)
- Add to MetaCart
for various measures of “closeness” of input data, such as Hamming distance, edit distance, and set difference.
A block-sorting lossless data compression algorithm
, 1994
"... We describe a block-sorting, lossless data compression algorithm, and our implementation of that algorithm. We compare the performance of our implementation with widely available data compressors running on the same hardware. The algorithm works by applying a reversible transformation to a block o ..."
Abstract
-
Cited by 809 (5 self)
- Add to MetaCart
of input text. The transformation does not itself compress the data, but reorders it to make it easy to compress with simple algorithms such as move-to-front coding. Our algorithm achieves speed comparable to algorithms based on the techniques of Lempel and Ziv, but obtains compression close to the best
Language-Based Information-Flow Security
- IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS
, 2003
"... Current standard security practices do not provide substantial assurance that the end-to-end behavior of a computing system satisfies important security policies such as confidentiality. An end-to-end confidentiality policy might assert that secret input data cannot be inferred by an attacker throug ..."
Abstract
-
Cited by 827 (57 self)
- Add to MetaCart
Current standard security practices do not provide substantial assurance that the end-to-end behavior of a computing system satisfies important security policies such as confidentiality. An end-to-end confidentiality policy might assert that secret input data cannot be inferred by an attacker
Automatic Subspace Clustering of High Dimensional Data
- Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the or ..."
Abstract
-
Cited by 724 (12 self)
- Add to MetaCart
identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
Clustering by passing messages between data points
- Science
, 2007
"... Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such “exemplars ” can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initi ..."
Abstract
-
Cited by 696 (8 self)
- Add to MetaCart
if that initial choice is close to a good solution. We devised a method called “affinity propagation,” which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges
Analysis of Recommendation Algorithms for E-Commerce
, 2000
"... Recommender systems apply statistical and knowledge discovery techniques to the problem of making product recommendations during a live customer interaction and they are achieving widespread success in E-Commerce nowadays. In this paper, we investigate several techniques for analyzing large-scale pu ..."
Abstract
-
Cited by 523 (22 self)
- Add to MetaCart
the web-purchasing transaction of a large E-commerce company whereas the second data set was collected from MovieLens movie recommendation site. For the experimental purpose, we divide the recommendation generation process into three sub processes{ representation of input data, neighborhood formation
Results 1 - 10
of
47,388