Cloud computing for genome science and methods
|
|
- Clarence Park
- 6 years ago
- Views:
Transcription
1 Cloud computing for genome science and methods Ben Langmead Department of Biostatistics
2 Sequencing throughput GA II 1.6 billion bp per day (2008) GA IIx 5 billion bp per day (2009) HiSeq billion bp per day (2010) Images: Numbers: Dates: Illumina press releases
3 Sequencing throughput End of 2009 Mid 2010 Late 2011/Early 2012 SOLiD 3+ System Up to 50 Gb Source: www3.appliedbiosystems.com/cms/groups/mcb_marketing/documents/generaldocuments/cms_ pdf
4 Sequencing throughput End of 2009 Mid 2010 Late 2011/Early 2012 SOLiD 3+ System Up to 50 Gb Source: www3.appliedbiosystems.com/cms/groups/mcb_marketing/documents/generaldocuments/cms_ pdf
5 Computational throughput Moore s Law: The number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years.
6 Computational throughput Source: en.wikipedia.org/wiki/moore%27s_law
7 Computational throughput Core 2 Duo 386 Pentium Source: en.wikipedia.org/wiki/moore%27s_law
8 Throughput growth gap > 4-5x per year 2x per 2 years
9 Throughput growth gap = Idle
10 Throughput growth gap = Faster algorithms
11 Throughput growth gap =
12 Throughput growth gap =
13 Cloud computing
14 Cloud computing 1. Rent, don t buy = :: Cloud vendor :: Electric Company
15 Cloud computing 2. Large, centralized, efficient Columbia river for cheap hydroelectric power, cooling Source: nytimes.com
16 Cloud computing Why? Why not? Cost? Handles demand that grows, shrinks dramatically No hardware maintenance No alternative? Cost? Harder to program Less user-friendly Data movement is inconvenient & can outpace network Privacy (e.g. IRB)
17 Cloud computing Cost? Publications: Why? Handles demand that grows, shrinks dramatically No hardware (2010) maintenance No alternative? Why not? Stein LD. The case for cloud computing in genome informatics. Genome Biology. 2010;11(5):207. Epub 2010 May 5 Cost? Schatz MC, Langmead B, Salzberg SL. Cloud computing and the DNA data race. Nature Biotechnology Jul;28(7): Harder to program Less user-friendly Baker M. Next-generation sequencing: adjusting to data overload. Nature Methods 2010, 7: Data movement is Sansom C. Up in a cloud? Nature Biotechnology 28, Blogs: 1. PolITiGenomics 2. Informatics Iron, 3. business bytes genes molecules inconvenient & can outpace network Privacy (e.g. IRB)
18 Myrna
19 Myrna Sample A Gene 1 GGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT Sample B
20 ATATATATATATATAT Myrna Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC Gene 1 GGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT Sample B GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC
21 ATATATATATATATAT Myrna Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC TGTCGCAGTATCTGTC TATGTCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATATCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT CCGGAGCACCCTATAT GCCGGAGCACCCTATG Gene 1 GGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT Sample B GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC TGTCGCAGTATCTGTC GCCGGAGCACCCTATG
22 ATATATATATATATAT Myrna Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC TGTCGCAGTATCTGTC TATGTCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATATCGCA GAGCACCCTATGTCGC Overlap CCGGAGCACCCTATAT CCGGAGCACCCTATAT GCCGGAGCACCCTATG Gene 1 GGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT Sample B GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC TGTCGCAGTATCTGTC GCCGGAGCACCCTATG
23 ATATATATATATATAT Myrna Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC TGTCGCAGTATCTGTC TATGTCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATATCGCA GAGCACCCTATGTCGC Overlap CCGGAGCACCCTATAT CCGGAGCACCCTATAT GCCGGAGCACCCTATG Gene 1 GGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT Sample B GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC TGTCGCAGTATCTGTC GCCGGAGCACCCTATG
24 ATATATATATATATAT Myrna Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC TGTCGCAGTATCTGTC TATGTCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATATCGCA GAGCACCCTATGTCGC Overlap CCGGAGCACCCTATAT CCGGAGCACCCTATAT GCCGGAGCACCCTATG Normalize Gene 1 GGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT Sample B GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC TGTCGCAGTATCTGTC GCCGGAGCACCCTATG Normalize
25 ATATATATATATATAT Myrna Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC TGTCGCAGTATCTGTC TATGTCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATATCGCA GAGCACCCTATGTCGC Overlap CCGGAGCACCCTATAT CCGGAGCACCCTATAT GCCGGAGCACCCTATG Normalize Gene 1 GGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT Sample B GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC TGTCGCAGTATCTGTC GCCGGAGCACCCTATG Normalize
26 ATATATATATATATAT Myrna Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC TGTCGCAGTATCTGTC TATGTCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATATCGCA GAGCACCCTATGTCGC Overlap CCGGAGCACCCTATAT CCGGAGCACCCTATAT GCCGGAGCACCCTATG Normalize Gene 1 differentially expressed?: YES p-value: Statistics Gene 1 GGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT Sample B GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC TGTCGCAGTATCTGTC GCCGGAGCACCCTATG Normalize
27 Myrna Overlap Normalize Statistics Parallel by read
28 Myrna Overlap Normalize Statistics Parallel by read Parallel by genome bin
29 Myrna Overlap Normalize Statistics Parallel by read Parallel by genome bin Parallel by sample
30 Myrna Overlap Normalize Statistics Parallel by read Parallel by genome bin Parallel by sample Parallel by gene
31 Myrna Myrna Runtime, Cost for 1.1 billion reads from Pickrell et al study EC2 Nodes 1 master, 1 master, 1 master, 10 workers 20 workers 40 workers Worker CPU cores Wall clock time 4h:20m 2h:32m 1h:38m Cluster setup 4m 4m 3m 2h:56m 1h:31m 54m Overlap 52m 31m 16m Normalize 6m 7m 6m Statistics 9m 6m 6m Summarize & Postprocess 13m 14m 13m Approximate cost (N. Virginia / Elsewhere) $44.00 / $49.50 $50.40 / $56.70 $65.60 / $73.80 Table 1. Timing and cost for a Myrna experiment with 1.1 billion 35 bp unpaired reads from the Pickrell et al study as input. Costs are approximate and based on the pricing as of this writing, that is, $0.68 per extra-large high-cpu EC2 node per hour in the Northern Virginia zone and $0.78 in other zones, plus a $0.12 per-node-per-hour surcharge for Elastic MapReduce in all zones. Times can vary subject to, for example, congestion and Internet traffic conditions. Data transfer & preprocessing adds 1h:15m and $12
32 Myrna Myrna Runtime, Cost for 1.1 billion reads from Pickrell et al study EC2 Nodes 1 master, 1 master, 1 master, 10 workers 20 workers 40 workers Worker CPU cores Wall clock time 4h:20m 2h:32m 1h:38m Cluster setup 4m 4m 3m 2h:56m 1h:31m 54m Overlap 52m 31m 16m Normalize 6m 7m 6m Statistics 9m 6m 6m Summarize & Postprocess 13m 14m 13m Approximate cost (N. Virginia / Elsewhere) $44.00 / $49.50 $50.40 / $56.70 $65.60 / $73.80 Table 1. Timing and cost for a Myrna experiment with 1.1 billion 35 bp unpaired reads from the Pickrell et al study as input. Costs are approximate and based on the pricing as of this writing, that is, $0.68 per extra-large high-cpu EC2 node per hour in the Northern Virginia zone and $0.78 in other zones, plus a $0.12 per-node-per-hour surcharge for Elastic MapReduce in all zones. Times can vary subject to, for example, congestion and Internet traffic conditions. Data transfer & preprocessing adds 1h:15m and $12
33 Myrna Myrna Runtime, Cost for 1.1 billion reads from Pickrell et al study EC2 Nodes 1 master, 1 master, 1 master, 10 workers 20 workers 40 workers Worker CPU cores Wall clock time 4h:20m 2h:32m 1h:38m Cluster setup 4m 4m 3m 2h:56m 1h:31m 54m Overlap 52m 31m 16m Normalize 6m 7m 6m Statistics 9m 6m 6m Summarize & Postprocess 13m 14m 13m Approximate cost (N. Virginia / Elsewhere) $44.00 / $49.50 $50.40 / $56.70 $65.60 / $73.80 Table 1. Timing and cost for a Myrna experiment with 1.1 billion 35 bp unpaired reads from the Pickrell et al study as input. Costs are approximate and based on the pricing as of this writing, that is, $0.68 per extra-large high-cpu EC2 node per hour in the Northern Virginia zone and $0.78 in other zones, plus a $0.12 per-node-per-hour surcharge for Elastic MapReduce in all zones. Times can vary subject to, for example, congestion and Internet traffic conditions. Data transfer & preprocessing adds 1h:15m and $12
34 Acknowledgements Jeffrey Leek Kasper Hansen Rafael Irizarry Hector Corrada Bravo Margaret Taub Michael Schatz Jimmy Lin Mihai Pop Steven Salzberg Deepak Singh Peter Sirota Myrna website: Paper:
Computational Architecture of Cloud Environments Michael Schatz. April 1, 2010 NHGRI Cloud Computing Workshop
Computational Architecture of Cloud Environments Michael Schatz April 1, 2010 NHGRI Cloud Computing Workshop Cloud Architecture Computation Input Output Nebulous question: Cloud computing = Utility computing
More informationAssembly in the Clouds
Assembly in the Clouds Michael Schatz October 13, 2010 Beyond the Genome Shredded Book Reconstruction Dickens accidentally shreds the first printing of A Tale of Two Cities Text printed on 5 long spools
More informationSuccinct Data Structures: Theory and Practice
Succinct Data Structures: Theory and Practice March 16, 2012 Succinct Data Structures: Theory and Practice 1/15 Contents 1 Motivation and Context Memory Hierarchy Succinct Data Structures Basics Succinct
More informationCloud-scale Sequence Analysis
Cloud-scale Sequence Analysis Michael Schatz March 18, 2013 NY Genome Center / AWS Outline 1. The need for cloud computing 2. Cloud-scale applications 3. Challenges and opportunities Big Data in Bioinformatics
More informationScalable Solutions for DNA Sequence Analysis
Scalable Solutions for DNA Sequence Analysis Michael Schatz Dec 4, 2009 JHU/UMD Joint Sequencing Meeting The Evolution of DNA Sequencing Year Genome Technology Cost 2001 Venter et al. Sanger (ABI) $300,000,000
More informationRead Mapping. Slides by Carl Kingsford
Read Mapping Slides by Carl Kingsford Bowtie Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg, Genome Biology
More informationCloud Computing and the DNA Data Race Michael Schatz. April 14, 2011 Data-Intensive Analysis, Analytics, and Informatics
Cloud Computing and the DNA Data Race Michael Schatz April 14, 2011 Data-Intensive Analysis, Analytics, and Informatics Outline 1. Genome Assembly by Analogy 2. DNA Sequencing and Genomics 3. Large Scale
More informationCS15-319: Cloud Computing. Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud
CS15-319: Cloud Computing Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud Lecture Outline Discussion On Course Project Amazon Web Services 2 Course Project Course Project Phase I-A
More informationHigh-throughput Sequence Alignment using Graphics Processing Units
High-throughput Sequence Alignment using Graphics Processing Units Michael Schatz & Cole Trapnell May 21, 2009 UMD NVIDIA CUDA Center Of Excellence Presentation Searching Wikipedia How do you find all
More informationProcessing Genomics Data: High Performance Computing meets Big Data. Jan Fostier
Processing Genomics Data: High Performance Computing meets Big Data Jan Fostier Traditional HPC way of doing things Communication network (Infiniband) Lots of communication c c c c c Lots of computations
More informationCloud Computing and the DNA Data Race Michael Schatz. June 8, 2011 HPDC 11/3DAPAS/ECMLS
Cloud Computing and the DNA Data Race Michael Schatz June 8, 2011 HPDC 11/3DAPAS/ECMLS Outline 1. Milestones in DNA Sequencing 2. Hadoop & Cloud Computing 3. Sequence Analysis in the Clouds 1. Sequence
More informationMiseq spec, process and turnaround times
Miseq spec, process and turnaround s One Single lane & library pool / flow cell (on board clusterisation) 1 Flow cell / run Instrument used to sequence small libraries such as targeted sequencing or bacterial
More informationIntroduction to Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 15 September 2015
Introduction to Read Alignment UCD Genome Center Bioinformatics Core Tuesday 15 September 2015 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationBeyond the Genome: Cloud-scale computing demo
Beyond the Genome: Cloud-scale computing demo Michael Schatz, Ben Langmead, & James Taylor Sept. 19, 2011 Beyond the Genome Beyond the Genome Challenge http://schatzlab.cshl.edu/data/btg11.tgz http://aws.amazon.com/awscredits
More informationReview of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014
Review of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014 Deciphering the information contained in DNA sequences began decades ago since the time of Sanger sequencing.
More informationCS 6240: Parallel Data Processing in MapReduce: Module 1. Mirek Riedewald
CS 6240: Parallel Data Processing in MapReduce: Module 1 Mirek Riedewald Why Parallel Processing? Answer 1: Big Data 2 How Much Information? Source: http://www2.sims.berkeley.edu/research/projects/ho w-much-info-2003/execsum.htm
More information2013 AWS Worldwide Public Sector Summit Washington, D.C.
2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic
More informationThe Beauty and Joy of Computing
The Beauty and Joy of Computing Lecture #8 : Concurrency UC Berkeley Teaching Assistant Yaniv Rabbit Assaf Friendship Paradox On average, your friends are more popular than you. The average Facebook user
More informationChapter 1: Introduction to Parallel Computing
Parallel and Distributed Computing Chapter 1: Introduction to Parallel Computing Jun Zhang Laboratory for High Performance Computing & Computer Simulation Department of Computer Science University of Kentucky
More informationSupercomputing made super human
Supercomputing made super human The New Age of Accelerated Computing: A History of Innovation and Optimization in Computing Steve Hebert, Cofounder and CEO, Nimbix 2 1880 census had taken eight years to
More informationAssembly in the Clouds
Assembly in the Clouds Michael Schatz November 12, 2010 GENSIPS'10 Outline 1. Genome Assembly by Analogy 2. DNA Sequencing and Genomics 3. Sequence Analysis in the Clouds 1. Mapping & Genotyping 2. De
More informationIntegrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis
Integrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis Mian Lu, Yuwei Tan, Jiuxin Zhao, Ge Bai, and Qiong Luo Hong Kong University of Science and Technology {lumian,ytan,zhaojx,gbai,luo}@cse.ust.hk
More informationSession I: Towards More Precise and Meaningful Measurement. Jukka-Pekka JP Onnela Associate Professor Department of Biostatistics Harvard University
Session I: Towards More Precise and Meaningful Measurement Jukka-Pekka JP Onnela Associate Professor Department of Biostatistics Harvard University June 5, 2018 SENSORS, DATA, COMPUTATION Progress in science
More informationCloudBurst: Highly Sensitive Read Mapping with MapReduce
Bioinformatics Advance Access published April 8, 2009 Sequence Analysis CloudBurst: Highly Sensitive Read Mapping with MapReduce Michael C. Schatz* Center for Bioinformatics and Computational Biology,
More informationLecture 1: January 23
CMPSCI 677 Distributed and Operating Systems Spring 2019 Lecture 1: January 23 Lecturer: Prashant Shenoy Scribe: Jonathan Westin (2019), Bin Wang (2018) 1.1 Introduction to the course The lecture started
More informationAsst.Professor, Department of Computer Applications SVCET, Chittoor, Andhra Pradesh, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 4 ISSN : 2456-3307 Data Encryption Strategy with Privacy-Preserving
More informationThe Future of Interconnect Technology
The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 17 Datacenters and Cloud Compu5ng
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 17 Datacenters and Cloud Compu5ng Instructor: Dan Garcia h;p://inst.eecs.berkeley.edu/~cs61c/ 2/28/13 1 In the news Google disclosed
More informationECE 486/586. Computer Architecture. Lecture # 2
ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:
More informationPackage Rbowtie. January 21, 2019
Type Package Title R bowtie wrapper Version 1.23.1 Date 2019-01-17 Package Rbowtie January 21, 2019 Author Florian Hahne, Anita Lerch, Michael B Stadler Maintainer Michael Stadler
More informationPerformance of computer systems
Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type
More informationEXTRACT DATA IN LARGE DATABASE WITH HADOOP
International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0
More information2014 Harvard University Center for AIDS Research Workshop on Metagenomics and Transcriptomics
2014 Harvard University Center for AIDS Research Workshop on Metagenomics and Transcriptomics OBJECTIVES Starting your Amazon Virtual Machine Created by Scott A. Handley and Konrad Paszkiewicz Last updated:
More informationIntroduction to Amazon Web Services
Introduction to Amazon Web Services Introduction Amazon Web Services (AWS) is a collection of remote infrastructure services mainly in the Infrastructure as a Service (IaaS) category, with some services
More informationBallgown. flexible RNA-seq differential expression analysis. Alyssa Frazee Johns Hopkins
Ballgown flexible RNA-seq differential expression analysis Alyssa Frazee Johns Hopkins Biostatistics @acfrazee RNA-seq data Reads (50-100 bases) Transcripts (RNA) Genome (DNA) [use tool of your choice]
More informationCloud Computing. What is cloud computing. CS 537 Fall 2017
Cloud Computing CS 537 Fall 2017 What is cloud computing Illusion of infinite computing resources available on demand Scale-up for most apps Elimination of up-front commitment Small initial investment,
More informationParallelism: The Real Y2K Crisis. Darek Mihocka August 14, 2008
Parallelism: The Real Y2K Crisis Darek Mihocka August 14, 2008 The Free Ride For decades, Moore's Law allowed CPU vendors to rely on steady clock speed increases: late 1970's: 1 MHz (6502) mid 1980's:
More informationDarwin: A Genomic Co-processor gives up to 15,000X speedup on long read assembly (To appear in ASPLOS 2018)
Darwin: A Genomic Co-processor gives up to 15,000X speedup on long read assembly (To appear in ASPLOS 2018) Yatish Turakhia EE PhD candidate Stanford University Prof. Bill Dally (Electrical Engineering
More informationIntroduction to GPU computing
Introduction to GPU computing Nagasaki Advanced Computing Center Nagasaki, Japan The GPU evolution The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. The GPU
More informationLet s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.
Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein
More informationIntroduction to ICs and Transistor Fundamentals
Introduction to ICs and Transistor Fundamentals A Brief History 1958: First integrated circuit Flip-flop using two transistors Built by Jack Kilby at Texas Instruments 2003 Intel Pentium 4 mprocessor (55
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationAdvanced Database Systems
Lecture II Storage Layer Kyumars Sheykh Esmaili Course s Syllabus Core Topics Storage Layer Query Processing and Optimization Transaction Management and Recovery Advanced Topics Cloud Computing and Web
More informationMaximizing Public Data Sources for Sequencing and GWAS
Maximizing Public Data Sources for Sequencing and GWAS February 4, 2014 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda
More informationThe Future of High Performance Computing
The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer
More informationAdam M Phillippy Center for Bioinformatics and Computational Biology
Adam M Phillippy Center for Bioinformatics and Computational Biology WGS sequencing shearing sequencing assembly WGS assembly Overlap reads identify reads with shared k-mers calculate edit distance Layout
More informationEE5780 Advanced VLSI CAD
EE5780 Advanced VLSI CAD Lecture 1 Introduction Zhuo Feng 1.1 Prof. Zhuo Feng Office: EERC 513 Phone: 487-3116 Email: zhuofeng@mtu.edu Class Website http://www.ece.mtu.edu/~zhuofeng/ee5780fall2013.html
More informationMulticore Programming
Multi Programming Parallel Hardware and Performance 8 Nov 00 (Part ) Peter Sewell Jaroslav Ševčík Tim Harris Merge sort 6MB input (-bit integers) Recurse(left) ~98% execution time Recurse(right) Merge
More informationWhat is. Thomas and Lori Duncan
What is Thomas and Lori Duncan Definition of Cloud computing Cloud storage is a model of data storage where the digital data is stored in logical pools, the physical storage spans multiple servers (and
More informationLecture 1: January 22
CMPSCI 677 Distributed and Operating Systems Spring 2018 Lecture 1: January 22 Lecturer: Prashant Shenoy Scribe: Bin Wang 1.1 Introduction to the course The lecture started by outlining the administrative
More informationEmbedded Hardware and OS Technology Empower PC-Based Platforms
Embedded Hardware and OS Technology Empower PC-Based Platforms The modern embedded computer is a jack of all trades appearing in many forms Written by: Hector Lin, Advantech Corporation, Industrial Automation
More informationEmpirical Evaluation of Latency-Sensitive Application Performance in the Cloud
Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud Sean Barker and Prashant Shenoy University of Massachusetts Amherst Department of Computer Science Cloud Computing! Cloud
More informationWhat is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?
0.1 What is This Course About? 0.2 CS 356 Unit 0 Class Introduction Basic Hardware Organization Introduction to Computer Systems a.k.a. Computer Organization or Architecture Filling in the "systems" details
More informationForget about the Clouds, Shoot for the MOON
Forget about the Clouds, Shoot for the MOON Wu FENG feng@cs.vt.edu Dept. of Computer Science Dept. of Electrical & Computer Engineering Virginia Bioinformatics Institute September 2012, W. Feng Motivation
More informationOverview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services
Overview 15-441 15-441 Computer Networking 15-641 Lecture 19 Queue Management and Quality of Service Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 What is QoS? Queuing discipline and scheduling
More informationAccelrys Pipeline Pilot and HP ProLiant servers
Accelrys Pipeline Pilot and HP ProLiant servers A performance overview Technical white paper Table of contents Introduction... 2 Accelrys Pipeline Pilot benchmarks on HP ProLiant servers... 2 NGS Collection
More informationLinear Regression Optimization
Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:
More informationIntroducing Amazon Elastic File System (EFS)
Introducing Amazon Elastic File System (EFS) Danilo Poccia, Technical Evangelist, AWS @danilop 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Goals and expectations for this session
More informationReducing Genome Assembly Complexity with Optical Maps
Reducing Genome Assembly Complexity with Optical Maps AMSC 663 Mid-Year Progress Report 12/13/2011 Lee Mendelowitz Lmendelo@math.umd.edu Advisor: Mihai Pop mpop@umiacs.umd.edu Computer Science Department
More informationDistributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016
Distributed Systems 05r. Case study: Google Cluster Architecture Paul Krzyzanowski Rutgers University Fall 2016 1 A note about relevancy This describes the Google search cluster architecture in the mid
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationIntroduction to Amazon Web Services. Jeff Barr Senior AWS /
Introduction to Amazon Web Services Jeff Barr Senior AWS Evangelist @jeffbarr / jbarr@amazon.com What Does It Take to be a Global Online Retailer? The Obvious Part And the Not-So Obvious Part How Did
More informationCloud Computing. UCD IT Services Experience
Cloud Computing UCD IT Services Experience Background - UCD IT Services Central IT provider for University College Dublin 23,000 Full Time Students 7,000 Researchers 5,000 Staff Background - UCD IT Services
More informationECE 154A. Architecture. Dmitri Strukov
ECE 154A Introduction to Computer Architecture Dmitri Strukov Lecture 1 Outline Admin What this class is about? Prerequisites ii Simple computer Performance Historical trends Economics 2 Admin Office Hours:
More informationEfficient Alignment of Next Generation Sequencing Data Using MapReduce on the Cloud
212 Cairo International Biomedical Engineering Conference (CIBEC) Cairo, Egypt, December 2-21, 212 Efficient Alignment of Next Generation Sequencing Data Using MapReduce on the Cloud Rawan AlSaad and Qutaibah
More informationSackler Course BMSC-GA 4448 High Performance Computing in Biomedical Informatics. Class 2: Friday February 14 th, :30PM 5:30PM AGENDA
Sackler Course BMSC-GA 4448 High Performance Computing in Biomedical Informatics Class 2: Friday February 14 th, 2014 2:30PM 5:30PM AGENDA Recap 1 st class & Homework discussion. Fundamentals of Parallel
More informationWarehouse- Scale Computing and the BDAS Stack
Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,
More informationPackage ERSSA. November 4, 2018
Type Package Title Empirical RNA-seq Sample Size Analysis Version 1.0.0 Date 2018-10-09 Author Zixuan Shao [aut, cre] Package ERSSA November 4, 2018 Maintainer Zixuan Shao The
More informationHeterogenous Computing
Heterogenous Computing Fall 2018 CS, SE - Freshman Seminar 11:00 a 11:50a Computer Architecture What are the components of a computer? How do these components work together to perform computations? How
More informationCSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI Contents 1.7 - End of Chapter 1 Power wall The multicore era
More informationLecture 7: Data Center Networks
Lecture 7: Data Center Networks CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Nick Feamster Lecture 7 Overview Project discussion Data Centers overview Fat Tree paper discussion CSE
More informationLecture 18: Multithreading and Multicores
S 09 L18-1 18-447 Lecture 18: Multithreading and Multicores James C. Hoe Dept of ECE, CMU April 1, 2009 Announcements: Handouts: Handout #13 Project 4 (On Blackboard) Design Challenges of Technology Scaling,
More informationPerformance analysis of parallel de novo genome assembly in shared memory system
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Performance analysis of parallel de novo genome assembly in shared memory system To cite this article: Syam Budi Iryanto et al 2018
More informationCloud-Native File Systems
Cloud-Native File Systems Remzi H. Arpaci-Dusseau Andrea C. Arpaci-Dusseau University of Wisconsin-Madison Venkat Venkataramani Rockset, Inc. How And What We Build Is Always Changing Earliest days Assembly
More informationINTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design
INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 GBI0001@AUBURN.EDU ELEC 6200-001: Computer Architecture and Design Silicon Technology Moore s law Moore's Law describes a long-term trend in the history
More informationBoost Performance and Extend NAS Life
Boost Performance and Extend NAS Life Doug Rainbolt Vice President of Marketing Alacritech, Inc. Santa Clara, CA August 2012 1 Agenda Spring 2012 Alacritech Confidential & Proprietary All Rights Reserved
More informationConcurrency & Parallelism, 10 mi
The Beauty and Joy of Computing Lecture #7 Concurrency Instructor : Sean Morris Quest (first exam) in 5 days!! In this room! Concurrency & Parallelism, 10 mi up Intra-computer Today s lecture Multiple
More informationDesigning Fault-Tolerant Applications
Designing Fault-Tolerant Applications Miles Ward Enterprise Solutions Architect Building Fault-Tolerant Applications on AWS White paper published last year Sharing best practices We d like to hear your
More informationWelcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.
Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your
More informationMapReduce. Cloud Computing COMP / ECPE 293A
Cloud Computing COMP / ECPE 293A MapReduce Jeffrey Dean and Sanjay Ghemawat, MapReduce: simplified data processing on large clusters, In Proceedings of the 6th conference on Symposium on Opera7ng Systems
More informationComputer Architecture!
Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors
More informationIntroduction to Microprocessor
Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device
More informationModule 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT
TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012
More informationFundamentals of Computer Design
Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #28: Parallel Computing 2005-08-09 CS61C L28 Parallel Computing (1) Andy Carle Scientific Computing Traditional Science 1) Produce
More informationCS61C : Machine Structures
CS61C L28 Parallel Computing (1) inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #28: Parallel Computing 2005-08-09 Andy Carle Scientific Computing Traditional Science 1) Produce
More informationDon t Run out of Power: Use Smart Grid and Cloud Technology
Don t Run out of Power: Use Smart Grid and Cloud Technology Bruce Naegel Sr. Product Manager Symantec Corp. Presentation Overview Overview of IT Power Challenges SMART Grid as Part of the Solution Cloud
More informationHiding Amongst the Clouds
Hiding Amongst the Clouds A Proposal for Cloud-based Onion Routing Nicholas Jones Matvey Arye Jacopo Cesareo Michael J. Freedman Princeton University https://www.torproject.org/about/overview.html We and
More informationProcessor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)
More informationFundamentals of Computers Design
Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2
More informationFault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together
Fault-Tolerant Computer System Design ECE 695/CS 590 Putting it All Together Saurabh Bagchi ECE/CS Purdue University ECE 695/CS 590 1 Outline Looking at some practical systems that integrate multiple techniques
More informationUsing Secure Computation for Statistical Analysis of Quantitative Genomic Assay Data
Using Secure Computation for Statistical Analysis of Quantitative Genomic Assay Data Justin Wagner Ph.D. Candidate University of Maryland, College Park Advisor: Hector Corrada Bravo Genomic Assay Analysis
More informationVon Neumann architecture. The first computers used a single fixed program (like a numeric calculator).
Microprocessors Von Neumann architecture The first computers used a single fixed program (like a numeric calculator). To change the program, one has to re-wire, re-structure, or re-design the computer.
More informationCloud Computing Paradigms for Pleasingly Parallel Biomedical Applications
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics, Pervasive Technology Institute Indiana University
More informationLecture 2: Performance
Lecture 2: Performance Today s topics: Technology wrap-up Performance trends and equations Reminders: YouTube videos, canvas, and class webpage: http://www.cs.utah.edu/~rajeev/cs3810/ 1 Important Trends
More informationEECS 570 Lecture 25 Genomics and Hardware Multi-threading
Lecture 25 Genomics and Hardware Multi-threading Winter 2018 Prof. Satish Narayanasamy http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin,
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 25: Parallel Databases CSE 344 - Winter 2013 1 Announcements Webquiz due tonight last WQ! J HW7 due on Wednesday HW8 will be posted soon Will take more hours
More information2012 Business Continuity Management for CRISIS. Network Infrastructure for BCM
2012 Business Continuity Management for CRISIS Network Infrastructure for BCM FACTS about lack of DR Planning After the incident of the World Trade Center, 40% of the companies without disaster recovery
More informationCENG4480 Lecture 09: Memory 1
CENG4480 Lecture 09: Memory 1 Bei Yu byu@cse.cuhk.edu.hk (Latest update: November 8, 2017) Fall 2017 1 / 37 Overview Introduction Memory Principle Random Access Memory (RAM) Non-Volatile Memory Conclusion
More informationSAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS
SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS 1. What is SAP Vora? SAP Vora is an in-memory, distributed computing solution that helps organizations uncover actionable business insights
More information