Results 1  10
of
114
IBM Almaden
, 2009
"... The rank problem in succinct data structures asks to preprocess an array A[1.. n] of bits into a data structure using as close to n bits as possible, and answer queries of the form Rank(k) = ∑k i=1 A[i]. The problem has been intensely studied, and features as a subroutine in a majority of succinct d ..."
Abstract
 Add to MetaCart
The rank problem in succinct data structures asks to preprocess an array A[1.. n] of bits into a data structure using as close to n bits as possible, and answer queries of the form Rank(k) = ∑k i=1 A[i]. The problem has been intensely studied, and features as a subroutine in a majority of succinct data structures. We show that in the cell probe model with wbit cells, if rank takes t time, the space of the data structure must be at least n+n/w O(t) bits. This redundancy/query tradeoff is essentially optimal, matching our upper bound from [FOCS’08].
IBM Almaden
, 903
"... Abstract. We present a new method for computing core universal solutions in data exchange settings specified by sourcetotarget dependencies, by means of SQL queries. Unlike previously known algorithms, which are recursive in nature, our method can be implemented directly on top of any DBMS. Our me ..."
Abstract
 Add to MetaCart
Abstract. We present a new method for computing core universal solutions in data exchange settings specified by sourcetotarget dependencies, by means of SQL queries. Unlike previously known algorithms, which are recursive in nature, our method can be implemented directly on top of any DBMS. Our method is based on the new notion of a laconic schema mapping. A laconic schema mapping is a schema mapping for which the canonical universal solution is the core universal solution. We give a procedure by which every schema mapping specified by FO st tgds can be turned into a laconic schema mapping specified by FO st tgds that may refer to a linear order on the domain of the source instance. We show that our results are optimal, in the sense that the linear order is necessary and the method cannot be extended to schema mapping involving target constraints. 1
IBM Almaden
"... One of the main steps towards integration or exchange of data is to design the mappings that describe the (often complex) relationships between the source schemas or formats and the desired target schema. In this paper, we introduce a new operator, called MapMerge, that can be used to correlate mul ..."
Abstract
 Add to MetaCart
One of the main steps towards integration or exchange of data is to design the mappings that describe the (often complex) relationships between the source schemas or formats and the desired target schema. In this paper, we introduce a new operator, called MapMerge, that can be used to correlate multiple, independently designed schema mappings of smaller scope into larger schema mappings. This allows a more modular construction of complex mappings from various types of smaller mappings such as schema correspondences produced by a schema matcher or preexisting mappings that were designed by either a human user or via mapping tools. In particular, the new operator also enables a new “divideandmerge” paradigm for mapping creation, where the design is divided (on purpose) into smaller components that are easier to create and understand, and where MapMerge is used to automatically generate a meaningful overall mapping. We describe our MapMerge algorithm and demonstrate the feasibility of our implementation on several real and synthetic mapping scenarios. In our experiments, we make use of a novel similarity measure between two database instances with different schemas that quantifies the preservation of data associations. We show experimentally that MapMerge improves the quality of the schema mappings, by significantly increasing the similarity between the input source instance and the generated target instance. 1.
IBM Almaden
, 2014
"... We design a new distribution over poly(rε−1) × n matrices S so that for any fixed n × d matrix A of rank r, with probability at least 9/10, ‖SAx‖2 = (1 ± ε)‖Ax‖2 simultaneously for all x ∈ Rd. Such a matrix S is called a subspace embedding. Furthermore, SA can be computed in O(nnz(A))time, where nn ..."
Abstract
 Add to MetaCart
We design a new distribution over poly(rε−1) × n matrices S so that for any fixed n × d matrix A of rank r, with probability at least 9/10, ‖SAx‖2 = (1 ± ε)‖Ax‖2 simultaneously for all x ∈ Rd. Such a matrix S is called a subspace embedding. Furthermore, SA can be computed in O(nnz(A))time, where nnz(A) is the number of nonzero entries of A. This improves over all previous subspace embeddings, which required at least Ω(nd log d) time to achieve this property. We call our matrices S sparse embedding matrices. Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained leastsquares regression, lowrank approximation, approximating all leverage scores, and `pregression: • to output an x ′ for which ‖Ax ′ − b‖2 ≤ (1 + ε) minx ‖Ax − b‖2 for an n×d matrix A and an n×1 column vector b, we obtain an algorithm running in O(nnz(A))+ Õ(d3ε−2) time, and another in O(nnz(A) log(1/ε)) + Õ(d3 log(1/ε)) time. (Here Õ(f) = f · logO(1)(f).) • to obtain a decomposition of an n × n matrix A into a product of an n × k matrix L, a k × k diagonal matrix D, and an n × k matrix W, for which ‖A − LDW>‖F ≤ (1 + ε)‖A−Ak‖F, where Ak is the best rankk approximation, our algorithm runs in O(nnz(A)) + Õ(nk2ε−4 + k3ε−5) time. • to output an approximation to all leverage scores of an n × d input matrix A simultaneously, with constant relative error, our algorithms run in O(nnz(A) logn) + Õ(r3) time. • to output an x ′ for which ‖Ax ′ − b‖p ≤ (1 + ε) minx ‖Ax − b‖p for an n×d matrix A and an n×1 column vector b, we obtain an algorithm running in O(nnz(A) logn)+ poly(rε−1) time, for any constant 1 ≤ p <∞. We optimize the polynomial factors in the above stated running times, and show various tradeoffs. Finally, we provide preliminary experimental results which suggest that our algorithms are of interest in practice. 1 ar
IBM Almaden
"... Abstract — We propose a novel mechanism for routing and bandwidth allocation that exploits the selfish and rational behavior of flows in a network. Our mechanism leads to allocations that simultaneously optimize throughput and fairness criteria. We analyze the performance of our mechanism in terms ..."
Abstract
 Add to MetaCart
Abstract — We propose a novel mechanism for routing and bandwidth allocation that exploits the selfish and rational behavior of flows in a network. Our mechanism leads to allocations that simultaneously optimize throughput and fairness criteria. We analyze the performance of our mechanism in terms of the induced Nash equilibrium. We compare the allocations at the Nash equilibrium with throughputoptimal allocations as well as with fairnessoptimal allocations. Our mechanism offers a smooth tradeoff between these criteria, and allows us to produce allocations that are approximately optimal with respect to both. Our mechanism is also fairly simple and admits an efficient distributed implementation. I.
IBM Almaden and
"... An aggregate array computation is a loop that computes accumulated quantities over array elements. Such computations are common in programs that use arrays, and the array elements involved in such computations often overlap, especially across iterations of loops, resulting in significant redundancy ..."
Abstract
 Add to MetaCart
An aggregate array computation is a loop that computes accumulated quantities over array elements. Such computations are common in programs that use arrays, and the array elements involved in such computations often overlap, especially across iterations of loops, resulting in significant redundancy in the overall computations. This article presents a method and algorithms that eliminate such overlapping aggregate array redundancies and shows analytical and experimental performance improvements. The method is based on incrementalization, that is, updating the values of aggregate array computations from iteration to iteration rather than computing them from scratch in each iteration. This involves maintaining additional values not maintained in the original program. We reduce various analysis problems to solving inequality constraints on loop variables and array subscripts, and we apply results from work on array data dependence analysis. For aggregate array computations that have significant redundancy, incrementalization produces drastic speedup compared to previous optimizations; when there is little redundancy, the benefit might be offset by cache effects and other factors. Previous methods for loop optimizations of arrays do not per
The TSIMMIS Project: Integration of Heterogeneous Information Sources
"... The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives an overview of the project, describing components that extract properties from unstructured objects, ..."
Abstract

Cited by 535 (19 self)
 Add to MetaCart
, that translate information into a common object model, that combine information from several sources, that allow browsing of information, and that manage constraints across heterogeneous sites. Tsimmis is a joint project between Stanford and the IBM Almaden Research Center.
On the Streaming Model Augmented with a Sorting Primitive Gagan Aggarwal\Lambda Stanford University Mayur Datar#Google Sridhar Rajagopalan##IBM Almaden Matthias Ruhl$Google
"... Abstract The need to deal with massive data sets in many practical applications has led to a growing interest in computational models appropriate for large inputs. The most important quality of a realistic model is that it can be efficiently implemented across a wide range of platforms and operat ..."
Abstract
 Add to MetaCart
Abstract The need to deal with massive data sets in many practical applications has led to a growing interest in computational models appropriate for large inputs. The most important quality of a realistic model is that it can be efficiently implemented across a wide range of platforms and operating systems.
IBM Research—Almaden,
"... Network Attached Storage (NAS) and Virtual Machines (VMs) are widely used in data centers thanks to their manageability, scalability, and ability to consolidate resources. But the shift from physical to virtual clients drastically changes the I/O workloads seen on NAS servers, due to guest file syst ..."
Abstract
 Add to MetaCart
Network Attached Storage (NAS) and Virtual Machines (VMs) are widely used in data centers thanks to their manageability, scalability, and ability to consolidate resources. But the shift from physical to virtual clients drastically changes the I/O workloads seen on NAS servers, due to guest file system encapsulation in virtual disk images and the multiplexing of request streams from different VMs. Unfortunately, current NAS workload generators and benchmarks produce workloads typical to physical machines. This paper makes two contributions. First, we studied the extent to which virtualization is changing existing NAS workloads. We observed significant changes, including the disappearance of file system metadata operations at the NAS layer, changed I/O sizes, and increased randomness. Second, we created a set of versatile NAS benchmarks to synthesize virtualized workloads. This allows us to generate accurate virtualized workloads without the effort and limitations associated with setting up a full virtualized environment. Our experiments demonstrate that the relative error of our virtualized benchmarks, evaluated across 11 parameters, averages less than 10%. 1
Results 1  10
of
114