Harnessing Associative Computing for Sequence Alignment with Parallel Accelerators

Harnessing Associative Computing for Sequence Alignment with Parallel Accelerators Shannon I. Steinfadt Doctoral Research Showcase III Room 17 A / B 4:00-4:15 International Conference for High Performance Computing, Networking, Storage and Analysis (SC 08) Advisor: Dr. Johnnie W. Baker Parallel and Associative Computing Lab Computer Science Department Kent State University

Outline Introduction to Bioinformatics Local Sequence Alignment ASC about SWAMP SWAMP+ ASC on Metal gcggacgct ccacg-tgtc--c --c- tcgccgcgc cc-cgtctacc : : : : - : : - - -- - : : :: - : gggccct cctggctcccaac agc ttctcagttc ccacttc Dynamic Programming - Automatic Parallelization Conclusion Questions? 2

What is Bioinformatics? Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. * The ultimate goal: enable the discovery of new biological insights create a global perspective from which unifying principles in biology can be discerned aattctaatt tctttccatg gagtttttca ttagatccag aaaaaagaag tcaatctctt tttacaaact actgccctaa agaatcatac tttaatccgt tggaggggta agactgcact gtgacatgac tatagaaagt agatttgtat cctagttcta ttatccatgt gtgtaaggca Human Xp DNA base pairs 180 of 5303 base pairs *Definition from NCBI 3

Pairwise Local Sequence Alignment Search for regions of high similarity between two strings Similar Characters Similar Structure Similar Function One of the most common fundamental tasks is local sequence alignment 4

Pairwise Local Sequence Alignment Search for regions of high similarity between two strings Similar Characters Similar Structure Similar Function Homologous Sequences (derived by humans) Ancestral Relationships Gene Functionality Aid in Drug Discovery (preserved by evolution) 5

Goals for Sequence Alignment Provide Accurate (use Smith-Waterman) Fast More detailed alignments One of the most used operations in bioinformatics 6

Sequence Alignment Methods Two possible approaches: Heuristics (approximations): e.g. BLAST, mpiblast, FastA the more efficient the heuristics usually the worse the quality of the results Exact algorithms: Jaligner, MPSRCH, Smith-Waterman Parallel Processing: get high-quality results in less time (using the Smith-Waterman algorithm) 7

Sequence Alignment Methods Speed vs. Quality BLAST, FastA, Smith-Waterman Slower Search Speed Faster BLAST Lower FastA Data Quality Smith- Waterman Higher 8

Traceback in the Smith-Waterman Algorithm 1) Find the maximum computed value Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 9

Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other 10

Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 11

Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 12

Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 13

Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 14

Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 15

Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 16

Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 17

Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 18

Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 19

Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 20

Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 21

Traceback in the Smith-Waterman Algorithm 1) Find the maximum computed value Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 22

Traceback in the Smith-Waterman Algorithm 1) Find the maximum computed value 2) Traceback until you reach 0 s Alignment: CATTG C - -TG Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 23

Goals for Sequence Alignment Provide Accurate (use Smith-Waterman) Fast More detailed alignments One of the most used operations in bioinformatics 24

Motivation for Faster Alignment Sequences analyzed by comparison with database(s) Complexity of comparisons proportional to the product of query size times database size i.e. your sequence size * the size of each sequence * number of sequences 262 * 366 * 1,000,000 = 95,892,000,000 comparisons The number of base pairs doubles ~18 months in GenBank 85,759,586,764 bases in 82,853,685 sequence records (2/08) 25

Genomic Databases Growth of the International Nucleotide Sequence Database Collaboration (INSDC) base pairs contributed by: EMBL DDJB GenBank Base pairs in (billions) Exponential growth of public sequence data means more to align with; the faster an alignment, the better. 26

Get It, Got It, Good (or Better) ctcgccgcgc ggcggacgct ccacgtgtcc cccgtctacc gggccctcct ggctcccaac agcttctcag ttcccacttc Have These Want This Associative SIMD Model - ASC Use This 27

Get It, Got It, Good (or Better) ctcgccgcgc ggcggacgct ccacgtgtcc cccgtctacc gggccctcct ggctcccaac agcttctcag ttcccacttc Have These Want This ClearSpeed Advance 620 PCI-X board 50 GFLOPS peak performance 25W average power dissipation Use This 28

Get It, Got It, Good (or Better) ctcgccgcgc ggcggacgct ccacgtgtcc cccgtctacc gggccctcct ggctcccaac agcttctcag ttcccacttc Have These Want This NVIDIA C870 Tesla GPGPU 518 Peak GFLOPS on Tesla 170W peak, 120W typical Use This 29

ASC: Associative Architecture SIMD with special associative features Fine-grained parallelism Designed for fast associative searches Content-based searches, not memory address 30

ASC Advantages Quick data movement in SIMD Move raw data in parallel At each step, PEs follow the algorithmic steps for data movement in lock step No message passing like MPI/PVM No store/forward No headers No explicit synchronizing 31

ASC: Associative Architecture Very fast operations for: Finding Maximum / Minimum Finding if there are Any Responders Pick One active PE 32

Parallelizing the Smith-Waterman Algorithm 33

Parallelizing the Smith-Waterman Algorithm 34

Parallelizing the Algorithm 35

Parallelizing the Algorithm 36

Parallelizing the Algorithm C A T T G 37

SWAMP (Smith-Waterman using Associative Massive Parallelism) Order of Computations Used PEs Unused PEs 38

Goals for Sequence Alignment Provide Fast Accurate (use Smith-Waterman) More detailed alignments Highest accuracy with more higher information content would be better 39

SWAMP+ SWAMP+ returns multiple non-overlapping sequences Search and process with SWAMP multiple times Return top k non-overlapping, non-intersecting sequences Reveal additional information Spatial information Length of comparisions Identify regulatory regions and motifs 40

ASC on Metal ASC SIMD with Additional Features Associative Functions Associative Search Search via Content, not Memory Address Associative Functions ClearSpeed SIMD Accelerator (64-bit FP) 50 GFLOPS peak performance 25W average power dissipation NVIDIA Tesla GPGPU Stream Processor (32-bit FP) 518 Peak GFLOPS on Tesla Series 170W peak, 120W typical 41

ASC on Metal Associative Functions NVIDIA Tesla GPGPU x 2 42

GPGPU Internal Organization Multiple Levels of Parallelism Up to 512 threads per block Communicate through shared memory Grids of thread blocks SPMD Computation Model All data processed by the same program (kernel) From Scalable Parallel Programming with CUDA. From GPUs for Parallel Programming Vol. 6, No. 2 - March/April 2008 by John Nickolls, et. al. 43

ASC to GPGPU Mapping ASC GPGPU PE Thread Local memory that belongs solely to PE / Thread PE Interconnection Network Per-block Shared Memory All PEs Block Limited here to 512 separate threads per block Multiple ASC Model (MASC) GPGPU Multiple Instruction Streams Multiple Blocks Mulitple MASC programs Multiple Grids 44

Q & A Contact Info: Shannon Steinfadt ssteinfa@cs.kent.edu http://www.cs.kent.edu/~ssteinfa 45

References CUDA Information J. Nickolls, I. Buck, M. Garland, K. Skardon, Scalable Parallel Programming with CUDA, ACM Queue Magazine, pp. 41 53, March/April 2008. Parallel Sequence Alignment Others S. A Manavski and G. Valle, CUDA Compatible GPU Cards as Efficient Hardware Accelerators for Smith-Waterman Sequence Alignment, BMC Bioinformatics, March 2008. M. Farrar, Striped Smith-Waterman Speeds Database Searches Six Times over Other SIMD Implementations, Bioinformatics, pp. 156 161, Jan. 2007. T. Rogens and E. Seeberg, Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8): 699-706, 2000. W. Liu, B. Schmidt, G. Voss, A. Schröder, and W. Müller-Wittig, Bio-Sequence Database Scanning on a GPU, Proc. 20th IEEE Int'l Parallel and Distributed Processing Symp. High Performance Computational Biology (HiCOMB) Workshop, 2006. 46