Results 1 
4 of
4
B.: Accelerating SWHE based PIRs using GPUs
"... Abstract. In this work we focus on tailoring and optimizing the computational Private Information Retrieval (cPIR) scheme proposed in WAHC 2014 for efficient execution on graphics processing units (GPUs). Exploiting the mass parallelism in GPUs is a commonly used approach in speeding up cPIRs. Our ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In this work we focus on tailoring and optimizing the computational Private Information Retrieval (cPIR) scheme proposed in WAHC 2014 for efficient execution on graphics processing units (GPUs). Exploiting the mass parallelism in GPUs is a commonly used approach in speeding up cPIRs. Our goal is to eliminate the efficiency bottleneck of the Doröz et al construction which would allow us to take advantage of its excellent bandwidth performance. To this end, we develop custom code to support polynomial ring operations and extend them to realize the evaluation functions in an optimized manner on high end GPUs. Specifically, we develop optimized CUDA code to support large degree/large coefficient polynomial arithmetic operations such as modular multiplication/reduction, and modulus switching. Moreover, we choose same prime numbers for both the CRT domain representation of the polynomials and for the modulus switching implementation of the somewhat homomorphic encryption scheme. This allows us to combine two arithmetic domains, which reduces the number of domain conversions and permits us to perform faster arithmetic. Our implementation achieves 1434 times speedup for index comparison and 418 times speedup for data aggregation compared to a pure CPU software implementation.
Accelerating Homomorphic Evaluation on Reconfigurable Hardware Extended Version
"... Abstract. Homomorphic encryption allows computation on encrypted data and makes it possible to securely outsource computational tasks to untrusted environments. However, all proposed schemes are quite inefficient and homomorphic evaluation of ciphertexts usually takes several seconds on highend C ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Homomorphic encryption allows computation on encrypted data and makes it possible to securely outsource computational tasks to untrusted environments. However, all proposed schemes are quite inefficient and homomorphic evaluation of ciphertexts usually takes several seconds on highend CPUs, even for evaluating simple functions. In this work we investigate the potential of FPGAs for speeding up those evaluation operations. We propose an architecture to accelerate schemes based on the ring learning with errors (RLWE) problem and specifically implemented the somewhat homomorphic encryption scheme YASHE, which was proposed by Bos, Lauter, Loftus, and Naehrig in 2013. Due to the large size of ciphertexts and evaluation keys, onchip storage of all data is not possible and external memory is required. For efficient utilization of the external memory we propose an efficient doublebuffered memory access scheme and a polynomial multiplier based on the number theoretic transform (NTT). For the parameter set (n = 16384, dlog2 qe = 512) capable of evaluating 9 levels of multiplications, we can perform a homomorphic addition in 0.94ms and a homomorphic multiplication in 48.67ms.
Accelerating Somewhat Homomorphic Evaluation using FPGAs
"... Abstract. After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a c ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in reallife applications. The accelerator we present is connected via a fast PCIe interface to a CPU platform to provide homomorphic evaluation services to any application that needs to support blinded computations. Specifically we introduce a number theoretical transform based multiplier architecture capable of efficiently handling very large polynomials. When synthesized for the Xilinx Virtex 7 family the presented architecture can compute the product of large polynomials in under 6.25 msec making it the fastest multiplier design of its kind currently available in the literature and is more than 102 times faster than a software implementation. Using this multiplier we can compute a relinearization operation in 526 msec. When used as an accelerator, for instance, to evaluate the AES block cipher, we estimate a per block homomorphic evaluation performance of 442 msec yielding performance gains of 28.5 and 17 times over similar CPU and GPU implementations, respectively.
cuHE: A Homomorphic Encryption Accelerator Library
"... Abstract. We introduce a CUDA GPU library to accelerate evaluations with homomorphic schemes defined over polynomial rings enabled with a number of optimizations including algebraic techniques for efficient evaluation, memory minimization techniques, memory and thread scheduling and low level CUDA h ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We introduce a CUDA GPU library to accelerate evaluations with homomorphic schemes defined over polynomial rings enabled with a number of optimizations including algebraic techniques for efficient evaluation, memory minimization techniques, memory and thread scheduling and low level CUDA handtuned assembly optimizations to take full advantage of the mass parallelism and high memory bandwidth GPUs offer. The arithmetic functions constructed to handle very large polynomial operands using numbertheoretic transform (NTT) and Chinese remainder theorem (CRT) based methods are then extended to implement the primitives of the leveled homomorphic encryption scheme proposed by LópezAlt, Tromer and Vaikuntanathan. To compare the performance of the proposed CUDA library we implemented two applications: the Prince block cipher and homomorphic sorting algorithms on two GPU platforms in single GPU and multiple GPU configurations. We observed a speedup of 25 times and 51 times over the best previous GPU implementation for Prince with single and triple GPUs, respectively. Similarly for homomorphic sorting we obtained 1241 times speedup depending on the number and size of the sorted elements.