19 citations found. Retrieving documents...
G. Davida and B. Litow. Fast parallel arithmetic via modular representation. SIAM J. Comput., 20:756--765, 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Uniform Constant-Depth Threshold Circuits for Division and.. - Hesse, Allender (2002)   (1 citation)  (Correct)

....is a short number. It is equal to the integer part of the sum of the k rational numbers x i h i C i M or x i h i m i , each of which is between 0 and m i . The computation of this rank function is central to the argument of [21] that Division is in L uniform TC . It is computable in logspace [25, 43], and in fact the algorithms can be adapted to put it in FOM POW. For more detail on this see [6] Here we present a self contained argument, without computing rank directly, that conversion from CRR to binary is in FOM POW. First we note again that we can carry out the other conversion, ....

....as ap b with b = 2 mod p. The kth bit of the binary expansion of the rational number 1 p is defined to be the low order bit of a. But since ap is congruent to b modulo 2 andp is odd,this is also the low order bit of b. Since b is 2 mod p, it can clearly be computed in FO POW. Lemma 4.3. [25, 27] Let X and Y be numbers less than M given in CRRM form. In FOM POW we can determine whether X Y . Proof. Clearly, X Y if and only if X M Y M . Thus it is sufficient to show that we can compute X M to polynomially many bits of accuracy. Recall that X = i=1 x i h i C i ) rankM (x)M . ....

G. I. Davida and B. Litow. Fast parallel arithmetic via modular representation. SIAM J. Comput., 20:756--765, 1991.


The Dynamic Complexity of Transitive Closure is in DynTC0 - Hesse (2002)   (Correct)

....only state that there exist polynomialtime uniform TC circuits to evaluate them. A series of results by Chiu, Davida, Litow, Allender, Barrington, and Hesse shows that the circuits for division, powering, and iterated multiplication used to evaluate polynomials can be made DLOGTIME uniform [1,8,3,4]. In particular, finding the product of n (polynomially many) numbers, each with n bits, can be done by a DLOGTIME uniform TC circuit. Evaluating the above polynomial requires us to raise numbers of O(n ) bits 11 to powers up to n 2, multiply the results by other numbers, add n 1 ....

George I. Davida and Bruce Litow. Fast Parallel Arithmetic via Modular Representation. SIAM Journal of Computing, 20(4):756--765, August 1991.


Division is in Uniform TC - William Hesse Department (2001)   (5 citations)  (Correct)

....(CRR) The circuits work by converting the inputs to CRR, computing iterated products in that representation, and converting the output to binary. In the later 1990s, Chiu, Davida, and Litow devised new ways of computing in CRR that reduced the complexity of converting from CRR into binary [5, 6]. These steps allowed them to construct logspace uniform and NC 1 uniform TC 0 circuits for division and iterated multiplication. Allender and Barrington reinterpreted those results in the framework of descriptive complexity, and showed that the only di#culty in expressing iterated ....

G. I. Davida and B. Litow. Fast Parallel Arithmetic via Modular Representation. SIAM Journal of Computing, 20(4):756--765, 1991.


Uniform Circuits for Division: Consequences and Problems - Allender, Barrington, Hesse (2000)   (2 citations)  (Correct)

....is a short number. It is equal to the integer part of the sum of the k rational numbers x i h i C i M or x i h i m i , each of which is between 0 and m i . The computation of this rank function is central to the argument of [11] that DIVISION is in L uniform TC 0 . It is computable in logspace [13, 21], and in fact the algorithms can be adapted to put it in FOM POW. For more detail on this see [2] Here we present a self contained argument, without computing rank directly, that conversion from CRR to binary and DIVISION are in FOM POW. First we note again that we can carry out the other ....

....write 2 k as ap b with b = 2 k mod p. The kth bit of the binary expansion of the rational number 1 p is defined to be the low order bit of a. But since ap is congruent to b modulo 2 and p is odd, this is also the low order bit of b. The latter can clearly be computed in FO POW. Lemma 3. 3 [13, 14] Let X and Y be numbers less than M given in CRRM form. In FOM POW we can determine whether X Y . Proof. Clearly, X Y if and only if X M Y M . Thus it is sufficient to show that we can compute X M to polynomially many bits of accuracy. Recall that X = # k i=1 x i h i C i ) rank ....

G. I. Davida and B. Litow. Fast parallel arithmetic via modular representation. SIAM J. Comput., 20:756--765, 1991.


The Division Breakthroughs - Allender (2001)   (Correct)

....The reader may wish to prove that if p is an odd prime, then the the kth bit of the binary expansion of the rational number 1=p is the low order bit of 2 k mod p. It is not at all obvious how to tell, given two numbers in CRR, which is larger. Logspace algorithms for this were presented in [14, 15]. Using the preceding lemma, we obtain yet another algorithm. Lemma 4.2 Let X and Y be numbers less than M given in CRRM form. In FOM POW we can determine whether X Y . Proof. Clearly, X Y if and only if X=M Y=M . Thus it is sufficient to show that we can compute X=M to polynomially many ....

G. I. Davida and B. Litow. Fast parallel arithmetic via modular representation. SIAM J. Comput., 20:756--765, 1991.


Uniform Circuits for Division: Consequences and Problems - Allender, Barrington (2000)   (2 citations)  (Correct)

....and X M , we can compute (x 1 , x k ) the CRRM form of X) in FOM POW. Proof. For each modulus m i and each j n we must calculate 2 j (mod m i ) given by the power predicate) add the results (using iterated addition in FOM) and take the result modulo m i (in FO) Lemma 3. 2 [11, 20] The rank of X with respect to M is computable in FOM POW. Proof. This result was first shown by Davida and Litow [11] but here we sketch an easier argument by Macarie [20] Note that by Lemma 3.1 we can assume that X is given in CRRM form. If we approximate each of the numbers x i D i M to ....

....j n we must calculate 2 j (mod m i ) given by the power predicate) add the results (using iterated addition in FOM) and take the result modulo m i (in FO) Lemma 3.2 [11, 20] The rank of X with respect to M is computable in FOM POW. Proof. This result was first shown by Davida and Litow [11], but here we sketch an easier argument by Macarie [20] Note that by Lemma 3.1 we can assume that X is given in CRRM form. If we approximate each of the numbers x i D i M to O(log n) bits of accuracy and then add the approximations (using iterated addition, in FOM) we get the right answer ....

[Article contains additional citation context not shown here]

G. I. Davida and B. Litow. Fast parallel arithmetic via modular representation. SIAM J. Comput., 20:756--765, 1991.


A Library for Parallel Modular Arithmetic - Power, Bradford   (Correct)

....in section 2.2. For the division A = qB r when B is a small integer we can rst nd r by taking A mod B as in section 2.2. Then subtracting r from A we can get q by exact division as before. When B is large the process is much more dicult. While several di erent algorithms have been proposed [9, 3, 8, 13], only [9] proposes an algorithm for general division on a parallel computer. General division is performed using a modi ed version of Knuth s classical q division algorithm presented in [11] An approximate reconstruction is used to calculate the most signi cant digits of the numbers in each ....

G.I. Davida and B. Litow. Fast parallel arithmetic via modular representation. SIAM J. Comput, 20(4):756-765, 1991.


Space-Efficient Deterministic Simulation of Probabilistic Automata - Macarie (1993)   (14 citations)  (Correct)

....tool (from the point of view of space complexity) to compare numbers having their residue representations. Other applications of this technique can be found in [DMS 1993] Davida and Litow, independently, propose another method to efficiently compare numbers given their residue representations [DL 1991]. The two methods turn out to have similar strength. Although one was expressed in the setting of deterministic space and the other in the setting of parallel time, they can easily be adapted to the other context. A more detailed comparison between them can be found in [DMS 1993] Theorem 4 S ....

....this subject. I also thank them for carefully reading an earlier version of this paper and suggesting significant improvements. I also thank Helmut Jurgensen for his comments on a recent version of this TR and Marius Zimand and Bruce Litow for bringing to my attention the paper by Davida and Litow [DL 1991]. 3 Actually, Freivalds used a different notion of space bounded probabilistic complexity class, but our simulation can also be adapted to his case. 14 ....

Davida, G.I., and Litow, B. Fast parallel arithmetic via modular representation. SIAM J. Comput. 20, 1991, pp. 756-765.


On TC^0, AC^0, and Arithmetic Circuits - Agrawal, Allender, Datta (1997)   (Correct)

....or of q 0 i (x 0 ) are not all 1. So we just need to show that each of the q 0 i s and i can be computed using Logspaceuniform TC 0 circuits. But this follows from Lemma 24 7,8 below. We remark that, instead of relying on [DMS94] it is also possible to make use of similar results of [Lit92, DL91]. It seems to us that the construction of [DMS94] results in a simpler circuit. Lemma 23 The following are computable in O(log n) space where in the following x is n bits long and p; p i ; g; k; z are all O(log n) bits long. 1. A generator g of the multiplicative group Z p , given prime p. 2. ....

George I. Davida and Bruce Litow. Fast parallel arithmetic via modular representation. SIAM Journal on Computing, 20(4):756--765, August 1991.


Bits and Relative Order from Residues, Space Efficiently - Dietz, Macarie, Seiferas (1994)   (Correct)

....to achieve this for relative order and for bits near the ends of the binary representation, and we come close (O(S k log S k ) for arbitrary bits. Such residue representations of numbers have been studied more extensively in connection with the design of hardware for fast computer arithmetic [Va55, SV55, Ga59, Sz62, MPK83, KCT62, ST67, SVC83, VV85, LC92, ZSY93, DL91]. In such a residue number system , we obtain fast addition and multiplication by performing these operations independently and in parallel modulo each of the primes (or modulo other appropriately chosen moduli) The issues that arise are similar, but the primary goal is the minimization of ....

....are similar, but the primary goal is the minimization of parallel time, rather than (sequential) space. We could use a result of Borodin [Bo77] to derive some of our results from analogous results for parallel time. Independently using a general approach quite similar to ours, Davida and Litow [DL91] have in fact obtained some of the needed analogous results, the ones needed for our Lemma 1 and Theorem 1. By working entirely in the setting of space even for these results, however, we are able to give Key words and phrases. residue number system, Chinese remainder theorem, binary ....

[Article contains additional citation context not shown here]

G. I. Davida and B. Litow, Fast parallel arithmetic via modular representation, SIAM Journal on Computing 20, 4 (August, 1991), 756--765.


On TC^0, AC^0, and Arithmetic Circuits - Agrawal, Allender, Datta (1997)   (Correct)

....of q 0 i (x 0 ) are not all 1. So we just need to show that each of the q 0 i s and i can be computed using Logspace uniform TC 0 circuits. But this follows from Lemma 23 7,8 below. We remark that, instead of relying on [DMS94] it is also possible to make use of similar results of [Lit92, DL91]. It seems to us that the construction of [DMS94] results in a simpler circuit. Lemma 22 The following are computable in O(logn) space where in the following x is n bits long and p; p i ; g; k; z are all O(log n) bits long. 1. A generator g of the multiplicative group Z p , given prime p. 2. ....

George I. Davida and Bruce Litow. Fast parallel arithmetic via modular representation. SIAM Journal on Computing, 20(4):756--765, August 1991.


Integer Division in Residue Number Systems - Hitz, Kaltofen (1995)   (4 citations)  (Correct)

....al. algorithm [2] Most RNS division algorithms use some form of binary expansion for the quotient or the reciprocal. They are usually closely tied to their respective hardware implementation, making the complexity analysis difficult. An exception in this category is the work of Davida and Litow [4]. They give the complete analysis for an almost uniform logdepth division circuit. Another more recent result by Lu and Chiang [8] is based on comparison and binary search. In our approach, we use an extended RNS which provides roughly the square of a normal RNS range for intermediate results. We ....

....and RECIP using RNS operations. However, it is not the scope of this paper to discuss hardware realizations. Various models of computation were used so far to derive the complexity of basic RNS operations. More recently, a thorough analysis for NC 1 circuits was carried out by Davida and Litow [4]. Unfortunately conversions were left out. We will show in the Appendix that conversion from RNS to mixed radix representation can be implemented in depth O(log n) using O(n 2 ) RNS processor elements; by RNS processor element we mean a circuit for arithmetic or boolean operations, such as ....

[Article contains additional citation context not shown here]

Davida, G. I. and Litow, B., "Fast parallel arithmetic via modular representation," SIAM J. Comput., vol.20, pp. 756--765, 1991.


Complexity Of Parallel Arithmetic Using The Chinese Remainder.. - Chiu (1995)   (4 citations)  Self-citation (Davida)   (Correct)

....[1] Since the iterated product problem has been hypothesized as a candidate for a problem within PTIME log space, research into the parallel time complexity of division is also closely tied to low level space complexity. Approaches to parallel division with better than O(log 2 n) time, e.g. [10, 1, 12, 3, 2 8], have all used alternate number representation systems, such as discrete Fourier transforms, or Chinese remaindering. These alternate systems represented numbers as small, independent units. This allowed for greater parallelism than ordinary binary arithmetic since the circuits were not limited ....

....parallel time complexity as multiplication, with both being in NC 1 . Our circuit uses the Chinese remainder representation to compute a geometric series approximation for the reciprocal quickly. Our approach is based on the O(log n) depth division circuit presented in [1] and later extended by [3]. This thesis also presents O(log n) time circuits for a wide variety of other arithmetic operations, such as addition, multiplication, and comparison, in the Chinese remainder representation. 1.2 A Brief Survey of Parallel Arithmetic Circuits In this section, we describe the simplest, fast ....

[Article contains additional citation context not shown here]

G. Davida and B. Litow. Fast parallel arithmetic via modular representation. SIAM Journal of Computing, 20:756--765, 1991.


NC1 Division - Chiu, Davida, Litow (2000)   (1 citation)  Self-citation (Davida Litow)   (Correct)

....WI, USA davida cs.uwm.edu k School of Information Technology, James Cook University, Townsville, Qld. 4811, Australia bruce cs.jcu.edu.au 1 Following the Beame, Cook, Hoover result, Davida and Litow showed how to compute integer division by log depth, polynomial size Boolean circuits in [6]. However, these circuits are not log uniform, although they can be computed using just slightly more space. Using a method due to Reif and chinese remaindering it is shown there that log n depth, n O(1) size division circuits can be produced using O(log(n) Delta log log(n) space. See [12] ....

....of the binary notations for x; x 2 ; x n where x is an n bit integer. The next theorem is a principal result from [1] Theorem 2 Either division, iterated product and powering are all in NC1, or none is in NC1. The next fact is relatively straightforward to prove, see e.g. Lemma 4. 5 [6]. Theorem 3 If x 1 ; x n ; m n, the computation of x 1 Delta Delta Delta x n , x 1 Delta Delta Delta x n mod m and x 1 Delta x 2 mod m are in NC1. Proof : A proof that x 1 Delta Delta Delta x n is in NC1 may be found as Lemma 4.5 in [6] Letting y = x 1 Delta ....

[Article contains additional citation context not shown here]

G. Davida and B. Litow. Fast parallel arithmetic via modular representation. SIAM J. Comp., 20,4:756--765, 1991.


Parallel Complexity of Integer Coprimality - Litow (2000)   (1 citation)  Self-citation (Litow)   (Correct)

....in [12] and information about the parallel complexity class NC may be found in [11, 4] It is also know that division can be done in the same time and size bounds, but slightly more than logspace is needed to build the requisite Boolean circuits. It is open whether or not division is in NC1. See [2, 5, 7] for more information about division. It is natural to ask about other arithmetic functions. Perhaps the most important function after the basic operations is GCD (greatest common divisor. Unfortunately, very little is known about the parallel complexity of GCD. In this situation it makes sense ....

G. Davida and B. Litow. Fast parallel arithmetic via modular representation. SIAM J. Comp., 20,4:756--765, 1991.


Integer Division is in NC¹ - Chiu, Davida, Litow (1995)   Self-citation (Davida Litow)   (Correct)

....7 Let x = x r Gamma1 ; x 0 ] be an n bit CRR integer with moduli base M and product M . From the Chinese remainder theorem, X i r x i i = R(x; M)M x . The coefficient R(x; M) is known as the rank of x with respect to moduli base M. We now give a method for computing rank from [3]. We start by rewriting the Chinese remainder theorem, X i r x i =m i = R x=M (6) Let oe i be the first q bits of the binary expansion of x i =m i , where r Delta 2 Gammaq 1=4. We will approximate P x i =m i by P oe i . Our approximation error, ffl, is bounded by 0 ffl = X i r x ....

....0 = h7; 5i. R = x mod D = 2 = 2] using D R mod M = 2 = 2; 2; 2] extend base twice to M x Gamma R = 18 = 6; 0; 2] Gamma [2; 2; 2] 4; 3; 0] D Gamma1 mod M=D = 5; 2] using CRR M 0 (x Gamma R)D Gamma1 mod M=D = 4; 3] 5; 2] 6; 1] bx=Dc = 6 = 6; 1; 0] extend base M = M 0 [3] Theorem 9 CRR scaling can be computed in NC 1 . Proof : The proof follows the steps in the example. Let M 0 = M Gamma D. We make the observation that if N M, then the N vector for y can be obtained from the M vector for y by simply deleting the entries corresponding to moduli in M Gamma ....

G. Davida and B. Litow. Fast parallel arithmetic via modular representation. SIAM Journal of Computing, 20:756--765, 1991.


Uniform Constant-Depth Threshold Circuits for Division and .. - Hesse, Allender, al. (2002)   (1 citation)  (Correct)

No context found.

G. Davida and B. Litow. Fast parallel arithmetic via modular representation. SIAM J. Comput., 20:756--765, 1991.


On TC^0, AC^0, and Arithmetic Circuits - Agrawal, Allender, Datta (2000)   (Correct)

No context found.

George I. Davida and Bruce Litow. Fast parallel arithmetic via modular representation. SIAM Journal on Computing, 20(4):756--765, August 1991.


Integer Division in Residue Number Systems - Markus Hitz And (1995)   (4 citations)  (Correct)

No context found.

Davida, G. I. and Litow, B., "Fast parallel arithmetic via modular representation," SIAM J. Comput., vol.20, pp. 756--765, 1991.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC