Volume 14 Nr. 3
Abstract 7 - GPU ACCELERATED RNA-RNA INTERACTION ALGORITHM Rizk Guillaume*, Lavenier Dominique - IRISA-Symbiose ~ Rennes ~ France # 1D) Molecular structure prediction, modelling and dynamic
Motivation: Many bioinformatics studies require the analysis of RNA or DNA structures. Packages like Unafold (Markham, N. R. & Zuker, M. Nucleic Acids Res. 2005; 33, W577-W581) provide many tools to study secondary structures. However, the high computational complexity of these algorithms combined with the rapid increase of genomic data triggers the need of faster methods. Current approaches are (1) designing faster algorithms or (2) parallelizing work on multiprocessor systems. Here, we explore the use of graphics processing unit (GPU) to speed up these kind of computations, which possibly exhibits a higher performance/cost ratio than clusters. It has already been successfully used for the computation of the Smith-Waterman alignment (Svetlin A Manavski, Giorgio Valle BMC Bioinformatics 2008 9-S2). We propose to parallelize on GPU the hybrid function of the Unafold package, which computes the stability of the duplex formed by two RNA sequences. Methods: For an efficient parallelization, GPU need thousands of independent tasks. Parts taking the most time are found via program profiling and are then re-written in a way to expose parallelism. Our GPU implementation uses both parallelism within a single computation of the algorithm and between several execution of the algorithm across multiple pairs of sequences. Moreover, to achieve good performance the data needed by the algorithm have to be carefully dispatched in different memory spaces of the GPU, according to their size and their access pattern. Another difficulty comes from the need to reduce to a minimum the if-then-else control instructions of the GPU kernels as the GPU is a SIMD (single instruction multiple data) architecture. Results: Experiments have been done on an octo-core platform (2*Xeon E5430 2.66GHz, 8 GB RAM) with two NVIDIA Tesla cards. We benchmark our GPU implementation on 26000 pairs of sequences of length 50,50 with one or two cards versus the CPU version of the algorithm from one to eight cores. Total time spent for the complete application are respectively 100, 13.1, 9.8 and 5.3 seconds for 1 core, 8 cores, 1 card and 2 cards. GPU are a competitive alternative : the price of a platform with two Tesla cards is about the same as a platform with 8 processors but with 2.5 times the performance. Similar algorithms are used in a wide array of functions, such as the computation of the secondary structure of a single sequence which might also be parallelizable efficiently.