UC Berkeley CS61C : Machine Structures

Size: px
Start display at page:

Download "UC Berkeley CS61C : Machine Structures"

Transcription

1 inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 39 Software Parallel Computing CS61C L39 Software Parallel Computing (1) Thanks to Prof. Demmel for his CS267 slides & Andy Carle for 1 st CS61C draft Lecturer SOE Dan Garcia Five-straight big games! This one was supposed to be easy! Stanford showed up, and so did the winds as Cal struggled to beat SU, a 1-10 team and the weakest opponent all year. Next game is in the Holiday Bowl in San Diego vs 9-3 Texas A&M. calbears.cstv.com/sports/m-footbl/recaps/120206aaa.html

2 Outline Large, important problems require powerful computers Why powerful computers must be distributed or parallel computers What are these? Principles of parallel computing performance CS61C L39 Software Parallel Computing (2)

3 Simulation: The Third Pillar of Science Traditional scientific and engineering paradigm: Do theory or paper design Perform experiments or build system Limitations: Too difficult -- build large wind tunnels Too expensive -- build a throw-away passenger jet Too slow -- wait for climate or galactic evolution Too dangerous -- weapons, drug design, climate models Computational science paradigm: Use high performance computer (HPC) systems to simulate the phenomenon Based on physical laws & efficient numerical methods CS61C L39 Software Parallel Computing (3)

4 Example Applications Science Global climate modeling Biology: genomics; protein folding; drug design; malaria simulations Astrophysical modeling Computational Chemistry, Material Sciences and Nanosciences : Search for Extra-Terrestrial Intelligence Engineering Semiconductor design Earthquake and structural modeling Fluid dynamics (airplane design) Combustion (engine design) Crash simulation Computational Game Theory (e.g., Chess Databases) Business Rendering computer graphic imagery (CGI), ala Pixar and ILM Financial and economic modeling Transaction processing, web services and search engines Defense Nuclear weapons -- test by simulations Cryptography CS61C L39 Software Parallel Computing (4)

5 Performance Requirements Terminology FLOP FLoating point OPeration Flops/second standard metric for expressing the computing power of a system Example : Global Climate Modeling Divide the world into a grid (e.g. 10 km spacing) Solve fluid dynamics equations to determine what the air has done at that point every minute Requires about 100 Flops per grid point per minute This is an extremely simplified view of how the atmosphere works, to be maximally effective you need to simulate many additional systems on a much finer grid CS61C L39 Software Parallel Computing (5)

6 High Resolution Climate Modeling on NERSC-3 P. Duffy, et al., LLNL

7 Performance Requirements (2) Computational Requirements To keep up with real time (i.e. simulate one minute per wall clock minute): 8 Gflops/sec Weather Prediction (7 days in 24 hours): 56 Gflops/sec Climate Prediction (50 years in 30 days): 4.8 Tflops/sec Climate Prediction Experimentation (50 yrs in 12 hrs): 288 Tflops/sec Perspective Pentium 4 1.4GHz, 1GB RAM, 4x100MHz FSB ~0.3 Gflops/sec, effective Climate Prediction would take ~1233 years Reference: CS61C L39 Software Parallel Computing (7)

8 What Can We Do? Wait for our machines to get faster? Moore s law tells us things are getting better; why not stall for the moment? Heat issues Moore on last legs! Many believe so thus push for multi-core (fri)! Prohibitive costs : Rock s law Cost of building a semiconductor chip fabrication plant capable of producing chips in line w/moore s law doubles every four years In 2003, it cost $3 BILLION CS61C L39 Software Parallel Computing (8)

9 Upcoming Calendar Week # #15 This Week #16 Sun 2pm Review 10 Evans Mon Parallel Computing in Software Wed Parallel Computing in Hardware (Scott) Thu Lab I/O Networking & 61C Feedback Survey FINAL EXAM THU 12:30pm-3:30pm 234 Hearst Gym Final exam Same rules as Midterm, except you get 2 double-sided handwritten review sheets (1 from your midterm, 1 new one) + green sheet [Don t bring backpacks] Fri LAST CLASS Summary, Review, & HKN Evals CS61C L39 Software Parallel Computing (9)

10 Alternatively, many CPUs at once! Distributed computing Many computers (either homogeneous or heterogeneous) who get work units from a central dispatcher and return when they re done to get more { Grid, Cluster } Computing Parallel Computing Multiple processors all in one box executing identical code on different parts of problem to arrive at a unified (meaningful) solution Supercomputing CS61C L39 Software Parallel Computing (10)

11 Distributed Computing Let s tie together many disparate machines into one compute cluster These could all be the same (easier) or very different machines (harder) Common themes Dispatcher hands jobs & collects results Workers (get, process, return) until done Examples SETI@Home, BOINC, Render farms Google clusters running MapReduce CS61C L39 Software Parallel Computing (11)

12 Google s MapReduce Remember CS61A? (reduce + (map square '(1 2 3)) (reduce + '(1 4 9)) 14 We told you the beauty of pure functional programming is that it s easily parallelizable Do you see how you could parallelize this? What if the reduce function argument were associative, would that help? Imagine 10,000 machines ready to help you compute anything you could cast as a MapReduce problem! This is the abstraction Google is famous for authoring (but their reduce not exactly the same as the CS61A s reduce) It hides LOTS of difficulty of writing parallel code! The system takes care of load balancing, dead machines, etc CS61C L39 Software Parallel Computing (12)

13 MapReduce Programming Model Input & Output: each a set of key/value pairs Programmer specifies two functions: map (in_key, in_value) list(out_key, intermediate_value) Processes input key/value pair Produces set of intermediate pairs reduce (out_key, list(intermediate_value)) list(out_value) Combines all intermediate values for a particular key Produces a set of merged output values (usu just one) code.google.com/edu/parallel/mapreduce-tutorial.html CS61C L39 Software Parallel Computing (13)

14 Example : Word Occurrances in Files map(string input_key, String input_value): // input_key : document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(string output_key, Iterator intermediate_values): // output_key : a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result)); This is just an example Real code at Google is more complicated and in C++ Real code at Hadoop (open source MapReduce) in Java CS61C L39 Software Parallel Computing (14)

15 Example : MapReduce Execution file 1 file 2 file 3 file 4 file 5 file 6 file 7 ah ah er ah if or or uh or ah if map(string input_key, String input_value): // input_key : doc name // input_value: doc contents for each word w in input_value: EmitIntermediate(w, "1"); ah:1 ah:1 er:1 ah:1 if:1 or:1 or:1 uh:1 or:1 ah:1 if:1 ah:1,1,1,1 er:1 if:1,1 or:1,1,1 uh:1 reduce(string output_key, Iterator intermediate_values): // output_key : a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result)); CS61C L39 Software Parallel Computing (15) (ah) (er) (if) (or) (uh)

16 How Do I Experiment w/mapreduce? 1. Work for Google. ;-) 2. Use the Open Source versions Hadoop: map/reduce & distributed file system: lucene.apache.org/hadoop/ Nutch: crawler, parsers, index : lucene.apache.org/nutch/ Lucene Java: text search engine library: lucene.apache.org/java/docs/ 3. Wait until 2007Fa (You heard it here 1 st!) Google will be giving us: Single Rack, 40 dual proc xeon, 4GB Ram, 600GB Disk Also available as vmware image We re developing bindings for Scheme for 61A It will be available to students & researchers CS61C L39 Software Parallel Computing (16)

17 Recent History of Parallel Computing Parallel Computing as a field exploded in popularity in the mid-1990s This resulted in an arms race between universities, research labs, and governments to have the fastest supercomputer in the world LINPACK (solving dense system of linear equations) is benchmark Source: top500.org CS61C L39 Software Parallel Computing (17)

18 Current Champions (Nov 2006) BlueGene/L eserver Blue Gene Solution IBM DOE / NNSA / LLNL Livermore, CA, United States 65,536 dual-processors, Tflops/s 0.7 GHz PowerPC 440, IBM unclassified, classified Red Storm Sandia / Cray Red Storm NNSA / Sandia National Laboratories Albuquerque, NM, United States 26,544 Processors, Tflops/s 2.4 GHz dual-core Opteron, Cray Inc. BGW - eserver Blue Gene Solution IBM TJ Watson Research Center Yorktown Heights, NY, United States 40,960 Processors, 91.3 Tflops/s 0.7 GHz PowerPC 440, IBM CS61C L39 Software Parallel Computing (18) top500.org

19 Synchronization (1) How do processors communicate with each other? How do processors know when to communicate with each other? How do processors know which other processor has the information they need? When you are done computing, which processor, or processors, have the answer? CS61C L39 Software Parallel Computing (19)

20 Synchronization (2) Some of the logistical complexity of these operations is reduced by standard communication frameworks Message Passing Interface (MPI) Sorting out the issue of who holds what data can be made easier with the use of explicitly parallel languages Unified Parallel C (UPC) Titanium (Parallel Java Variant) Even with these tools, much of the skill and challenge of parallel programming is in resolving these problems CS61C L39 Software Parallel Computing (20)

21 Parallel Processor Layout Options P1 P2 Pn $ $ $ bus Shared Memory memory P0 NI P1 NI Pn NI memory memory... memory Distributed Memory interconnect CS61C L39 Software Parallel Computing (21)

22 Parallel Locality We now have to expand our view of the memory hierarchy to include remote machines Remote memory behaves like a very fast network Bandwidth vs. Latency becomes important CS61C L39 Software Parallel Computing (22) Regs Instr. Operands Cache Blocks Memory Blocks Remote Memory Large Data Blocks Local and Remote Disk

23 Amdahl s Law Applications can almost never be completely parallelized Let s be the fraction of work done sequentially, so (1-s) is fraction parallelizable, and P = number of processors Speedup(P) = Time(1) / Time(P) 1 / ( s + ((1-s) / P) ), and as P 1/s Even if the parallel portion of your application speeds up perfectly, your performance may be limited by the sequential portion CS61C L39 Software Parallel Computing (23)

24 Parallel Overhead Given enough parallel work, these are the biggest barriers to getting desired speedup Parallelism overheads include: cost of starting a thread or process cost of communicating shared data cost of synchronizing extra (redundant) computation Each of these can be in the range of ms (many Mflops) on some systems Tradeoff: Algorithm needs sufficiently large units of work to run fast in parallel (I.e. large granularity), but not so large that there is not enough parallel work CS61C L39 Software Parallel Computing (24)

25 Load Imbalance Load imbalance is the time that some processors in the system are idle due to insufficient parallelism (during that phase) unequal size tasks Examples of the latter adapting to interesting parts of a domain tree-structured computations fundamentally unstructured problems Algorithms need to carefully balance load CS61C L39 Software Parallel Computing (25)

26 Peer Instruction of Assumptions 1. Writing & managing is relatively straightforward; just hand out & gather data 2. Most parallel programs that, when run on N (N big) identical supercomputer processors will yield close to N x performance increase 3. The majority of the world s computing power lives in supercomputer centers CS61C L39 Software Parallel Computing (26) ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

27 Peer Instruction Answer 1. The heterogeneity of the machines, handling machines that fail, falsify data. FALSE 2. The combination of Amdahl s law, overhead, and load balancing take its toll. FALSE 3. Have you considered how many PCs + game devices exist? Not even close. FALSE 1. Writing & managing SETI@Home is relatively straightforward; just hand out & gather data 2. Most parallel programs that, when run on N (N big) identical supercomputer processors will yield close to N x performance increase 3. The majority of the world s computing power lives in supercomputer centers ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT CS61C L39 Software Parallel Computing (27)

28 Summary Parallel & Distributed Computing is a multi-billion dollar industry driven by interesting and useful scientific computing & business applications It is extremely unlikely that sequential computing will ever again catch up with the processing power of parallel or distributed systems Programming parallel systems can be extremely challenging (distributed computing less so, thanks to APIs), but is built upon many of the concepts you ve learned this semester in 61c CS61C L39 Software Parallel Computing (28)

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #28: Parallel Computing 2005-08-09 CS61C L28 Parallel Computing (1) Andy Carle Scientific Computing Traditional Science 1) Produce

More information

CS61C : Machine Structures

CS61C : Machine Structures CS61C L28 Parallel Computing (1) inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #28: Parallel Computing 2005-08-09 Andy Carle Scientific Computing Traditional Science 1) Produce

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 42 Software Parallel Computing 2007-05-02 Thanks to Prof. Demmel for his CS267 slides & Andy Carle for 1 st CS61C draft www.cs.berkeley.edu/~demmel/cs267_spr05/

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #30 Parallel Computing 2007-8-15 Scott Beamer, Instructor Ion Wind Cooling Developed by Researchers at Purdue CS61C L30 Parallel Computing

More information

Lecture #30 Parallel Computing

Lecture #30 Parallel Computing CS61C L30 Parallel Computing (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #30 Parallel Computing 2007-8-15 Scott Beamer, Instructor Ion Wind Cooling Developed by Researchers at

More information

CS10 The Beauty and Joy of Computing

CS10 The Beauty and Joy of Computing CS10 The Beauty and Joy of Computing Lecture #19 Distributed Computing UC Berkeley EECS Lecturer SOE Dan Garcia 2010-11-08 Researchers at Indiana U used data mining techniques to uncover evidence that

More information

The Beauty and Joy of Computing

The Beauty and Joy of Computing The Beauty and Joy of Computing Lecture #19 Distributed Computing UC Berkeley Sr Lecturer SOE Dan By the end of the decade, we re going to see computers that can compute one exaflop (recall kilo, mega,

More information

CS4961 Parallel Programming. Lecture 1: Introduction 08/25/2009. Course Details. Mary Hall August 25, Today s Lecture.

CS4961 Parallel Programming. Lecture 1: Introduction 08/25/2009. Course Details. Mary Hall August 25, Today s Lecture. Parallel Programming Lecture 1: Introduction Mary Hall August 25, 2009 Course Details Time and Location: TuTh, 9:10-10:30 AM, WEB L112 Course Website - http://www.eng.utah.edu/~cs4961/ Instructor: Mary

More information

The Beauty and Joy of Computing

The Beauty and Joy of Computing The Beauty and Joy of Computing Lecture #18 Distributed Computing UC Berkeley Sr Lecturer SOE Dan By the end of the decade, we re going to see computers that can compute one exaflop (recall kilo, mega,

More information

Outline. Lecture 40 Hardware Parallel Computing Thanks to John Lazarro for his CS152 slides inst.eecs.berkeley.

Outline. Lecture 40 Hardware Parallel Computing Thanks to John Lazarro for his CS152 slides inst.eecs.berkeley. CS61C L40 Hardware Parallel Computing (1) inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 40 Hardware Parallel Computing 2006-12-06 Thanks to John Lazarro for his CS152 slides

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 40 Hardware Parallel Computing 2006-12-06 Thanks to John Lazarro for his CS152 slides inst.eecs.berkeley.edu/~cs152/ Head TA

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 35 Caches IV / VM I 2004-11-19 Andy Carle inst.eecs.berkeley.edu/~cs61c-ta Google strikes back against recent encroachments into the Search

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS61C L41 Performance I (1) Lecture 41 Performance I 2004-12-06 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Sour Roses! Cal s best season

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #28: Parallel Computing + Summary 2006-08-16 CS61C L28 Parallel Computing (1) Andy Carle What Programs Measure for Comparison? Ideally

More information

Last time. Lecture #29 Performance & Parallel Intro

Last time. Lecture #29 Performance & Parallel Intro CS61C L29 Performance & Parallel (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #29 Performance & Parallel Intro 2007-8-14 Scott Beamer, Instructor Paper Battery Developed by Researchers

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 1 Introduction Li Jiang 1 Course Details Time: Tue 8:00-9:40pm, Thu 8:00-9:40am, the first 16 weeks Location: 东下院 402 Course Website: TBA Instructor:

More information

! CS61C : Machine Structures. Lecture 27 Performance II & Inter-machine Parallelism. !!Instructor Paul Pearce!

! CS61C : Machine Structures. Lecture 27 Performance II & Inter-machine Parallelism. !!Instructor Paul Pearce! inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 27 Performance II & Inter-machine Parallelism 2010-08-05!!!Instructor Paul Pearce! DENSITY LIMITS IN HARD DRIVES?! Yesterday Samsung! announced

More information

CS267 / E233 Applications of Parallel Computers. Lecture 1: Introduction 1/18/99

CS267 / E233 Applications of Parallel Computers. Lecture 1: Introduction 1/18/99 CS267 / E233 Applications of Parallel Computers Lecture 1: Introduction 1/18/99 James Demmel demmel@cs.berkeley.edu http://www.cs.berkeley.edu/~demmel/cs267_spr99 CS267 L1 Intro Demmel Sp 1999 Outline

More information

CS 61C: Great Ideas in Computer Architecture. MapReduce

CS 61C: Great Ideas in Computer Architecture. MapReduce CS 61C: Great Ideas in Computer Architecture MapReduce Guest Lecturer: Justin Hsia 3/06/2013 Spring 2013 Lecture #18 1 Review of Last Lecture Performance latency and throughput Warehouse Scale Computing

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS 61C L21 Caches II (1) Lecture 21 Caches II 24-3-1 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia US Buys world s biggest RAM disk. 2.5TB!

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS61C L22 Caches II (1) CPS today! Lecture #22 Caches II 2005-11-16 There is one handout today at the front and back of the room! Lecturer PSOE,

More information

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA) Introduction to MapReduce Adapted from Jimmy Lin (U. Maryland, USA) Motivation Overview Need for handling big data New programming paradigm Review of functional programming mapreduce uses this abstraction

More information

MapReduce: Simplified Data Processing on Large Clusters

MapReduce: Simplified Data Processing on Large Clusters MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat OSDI 2004 Presented by Zachary Bischof Winter '10 EECS 345 Distributed Systems 1 Motivation Summary Example Implementation

More information

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 33 Caches III 2007-04-11 Future of movies is 3D? Dreamworks says they may exclusively release movies in this format. It s based

More information

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size CS61C L33 Caches III (1) inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C Machine Structures Lecture 33 Caches III 27-4-11 Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Future of movies is 3D? Dreamworks

More information

Lecture #27 Parallelism in Software

Lecture #27 Parallelism in Software inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #27 Parallelism in Software 2008-8-06 http://fold.it/portal/adobe_main Amazon Mechanical Turk http://www.mturk.com/mturk/welcome Albert

More information

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Why Parallel Computing? Haidar M. Harmanani Spring 2017 Definitions What is parallel? Webster: An arrangement or state that permits several

More information

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Distributed Computations MapReduce. adapted from Jeff Dean s slides Distributed Computations MapReduce adapted from Jeff Dean s slides What we ve learnt so far Basic distributed systems concepts Consistency (sequential, eventual) Fault tolerance (recoverability, availability)

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II 27-8-6 Scott Beamer, Instructor New Flow Based Routers CS61C L24 Cache II (1) www.anagran.com Caching Terminology When we try

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 30 Caches I 2006-11-08 Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Shuttle can t fly over Jan 1? A computer bug has

More information

Practice and Applications of Data Management CMPSCI 345. Lecture 18: Big Data, Hadoop, and MapReduce

Practice and Applications of Data Management CMPSCI 345. Lecture 18: Big Data, Hadoop, and MapReduce Practice and Applications of Data Management CMPSCI 345 Lecture 18: Big Data, Hadoop, and MapReduce Why Big Data, Hadoop, M-R? } What is the connec,on with the things we learned? } What about SQL? } What

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS61C L26 RAID & Performance (1) Lecture #26 RAID & Performance CPS today! 2005-12-05 There is one handout today at the front and back of the room!

More information

CS6963. Course Information. Parallel Programming for Graphics Processing Units (GPUs) Lecture 1: Introduction. Source Materials for Today s Lecture

CS6963. Course Information. Parallel Programming for Graphics Processing Units (GPUs) Lecture 1: Introduction. Source Materials for Today s Lecture Course Information CS6963 Parallel Programming for Graphics Processing Units (GPUs) Lecture 1: Introduction 1 CS6963: Parallel Programming for GPUs, MW 10:45-12:05, MEB 3105 Website: http://www.eng.utah.edu/~cs6963/

More information

Block Size Tradeoff (1/3) Benefits of Larger Block Size. Lecture #22 Caches II Block Size Tradeoff (3/3) Block Size Tradeoff (2/3)

Block Size Tradeoff (1/3) Benefits of Larger Block Size. Lecture #22 Caches II Block Size Tradeoff (3/3) Block Size Tradeoff (2/3) CS61C L22 Caches II (1) inst.eecs.berkeley.edu/~cs61c CS61C Machine Structures CPS today! Lecture #22 Caches II 25-11-16 There is one handout today at the front and back of the room! Lecturer PSOE, new

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 24 Introduction to CPU Design 2007-03-14 CS61C L24 Introduction to CPU Design (1) Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia

More information

CS 194 Parallel Programming. Why Program for Parallelism?

CS 194 Parallel Programming. Why Program for Parallelism? CS 194 Parallel Programming Why Program for Parallelism? Katherine Yelick yelick@cs.berkeley.edu http://www.cs.berkeley.edu/~yelick/cs194f07 8/29/2007 CS194 Lecure 1 What is Parallel Computing? Parallel

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 31 Caches I 2007-04-06 Powerpoint bad!! Research done at the Univ of NSW says that working memory, the brain part providing

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 12 Caches I Lecturer SOE Dan Garcia Midterm exam in 3 weeks! A Mountain View startup promises to do Dropbox one better. 10GB free storage,

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 32 Caches III 2008-04-16 Lecturer SOE Dan Garcia Hi to Chin Han from U Penn! Prem Kumar of Northwestern has created a quantum inverter

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS61C L40 I/O: Disks (1) Lecture 40 I/O : Disks 2004-12-03 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia I talk to robots Japan's growing

More information

Review. Motivation for Input/Output. What do we need to make I/O work?

Review. Motivation for Input/Output. What do we need to make I/O work? Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 34 Input / Output 2008-04-23 Hi to Gary McCoy from Tampa Florida! Arduino is an open-source electronics prototyping

More information

MapReduce-style data processing

MapReduce-style data processing MapReduce-style data processing Software Languages Team University of Koblenz-Landau Ralf Lämmel and Andrei Varanovich Related meanings of MapReduce Functional programming with map & reduce An algorithmic

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 14 Introduction to MIPS Instruction Representation II 2004-02-23 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia In the US, who is

More information

MapReduce. Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan

MapReduce. Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan MapReduce Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan The Problem Massive amounts of data >100TB (the internet) Needs simple processing Computers

More information

Database System Architectures Parallel DBs, MapReduce, ColumnStores

Database System Architectures Parallel DBs, MapReduce, ColumnStores Database System Architectures Parallel DBs, MapReduce, ColumnStores CMPSCI 445 Fall 2010 Some slides courtesy of Yanlei Diao, Christophe Bisciglia, Aaron Kimball, & Sierra Michels- Slettvet Motivation:

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

Parallel Computing: MapReduce Jin, Hai

Parallel Computing: MapReduce Jin, Hai Parallel Computing: MapReduce Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! MapReduce is a distributed/parallel computing framework introduced by Google

More information

Advanced Database Technologies NoSQL: Not only SQL

Advanced Database Technologies NoSQL: Not only SQL Advanced Database Technologies NoSQL: Not only SQL Christian Grün Database & Information Systems Group NoSQL Introduction 30, 40 years history of well-established database technology all in vain? Not at

More information

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab CS6030 Cloud Computing Ajay Gupta B239, CEAS Computer Science Department Western Michigan University ajay.gupta@wmich.edu 276-3104 1 Acknowledgements I have liberally borrowed these slides and material

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41 Performance II CS61C L41 Performance II (1) Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia UWB Ultra Wide Band! The FCC moved

More information

Chae, Summer 2008 UCB. manycore is coming. Chae, Summer 2008 UCB

Chae, Summer 2008 UCB. manycore is coming. Chae, Summer 2008 UCB inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #27 Parallelism in Software 2008806 What Can We Do? Wait for our machines to get faster? Moore s law tells us things are getting better;

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures www.cs.berkeley.edu/~ddgarcia Google takes on Office! Google Apps: premium services (email, instant vs messaging, calendar, web creation,

More information

IEEE Standard 754 for Binary Floating-Point Arithmetic.

IEEE Standard 754 for Binary Floating-Point Arithmetic. CS61C L11 Floating Point II (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #11 Floating Point II 2005-10-05 There is one handout today at the front and back of the room! Lecturer

More information

CS240A: Applied Parallel Computing. Introduction

CS240A: Applied Parallel Computing. Introduction CS240A: Applied Parallel Computing Introduction 1 CS 240A Course Information Web page: http://www.cs.ucsb.edu/~tyang/class/240a17 Class schedule: Tu/Th. 1pm-2:50pm Phelp 2510 Instructor: Tao Yang (tyang

More information

BİL 542 Parallel Computing

BİL 542 Parallel Computing BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,

More information

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29 Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions

More information

Direct-Mapped Cache Terminology. Caching Terminology. TIO Dan s great cache mnemonic. Accessing data in a direct mapped cache

Direct-Mapped Cache Terminology. Caching Terminology. TIO Dan s great cache mnemonic. Accessing data in a direct mapped cache Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 31 Caches II 2008-04-14 Hi to Yi Luo from Seattle, WA! In this week s Science, IBM researchers describe a new

More information

PART I - Fundamentals of Parallel Computing

PART I - Fundamentals of Parallel Computing PART I - Fundamentals of Parallel Computing Objectives What is scientific computing? The need for more computing power The need for parallel computing and parallel programs 1 What is scientific computing?

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #18 Introduction to CPU Design 2007-7-25 Scott Beamer, Instructor CS61C L18 Introduction to CPU Design (1) What about overflow? Consider

More information

Overview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc.

Overview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc. MapReduce and HDFS This presentation includes course content University of Washington Redistributed under the Creative Commons Attribution 3.0 license. All other contents: Overview Why MapReduce? What

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #23 Cache I 2007-8-2 Scott Beamer, Instructor CS61C L23 Caches I (1) The Big Picture Computer Processor (active) Control ( brain ) Datapath

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 14 Introduction to MIPS Instruction Representation II Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Are you P2P sharing fans? Two

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Is this the beginning of the end for our beloved Lecture 32 Caches I 2004-11-12 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia The Incredibles!

More information

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein

More information

Parallel Nested Loops

Parallel Nested Loops Parallel Nested Loops For each tuple s i in S For each tuple t j in T If s i =t j, then add (s i,t j ) to output Create partitions S 1, S 2, T 1, and T 2 Have processors work on (S 1,T 1 ), (S 1,T 2 ),

More information

Parallel Partition-Based. Parallel Nested Loops. Median. More Join Thoughts. Parallel Office Tools 9/15/2011

Parallel Partition-Based. Parallel Nested Loops. Median. More Join Thoughts. Parallel Office Tools 9/15/2011 Parallel Nested Loops Parallel Partition-Based For each tuple s i in S For each tuple t j in T If s i =t j, then add (s i,t j ) to output Create partitions S 1, S 2, T 1, and T 2 Have processors work on

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #11 Floating Point II Scott Beamer, Instructor Sony & Nintendo make E3 News 2007-7-12 CS61C L11 Floating Point II (1) www.nytimes.com Review

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 12 Caches I 2014-09-26 Instructor: Miki Lustig September 23: Another type of Cache PayPal Integrates Bitcoin Processors BitPay, Coinbase

More information

The MapReduce Abstraction

The MapReduce Abstraction The MapReduce Abstraction Parallel Computing at Google Leverages multiple technologies to simplify large-scale parallel computations Proprietary computing clusters Map/Reduce software library Lots of other

More information

IEEE Standard 754 for Binary Floating-Point Arithmetic.

IEEE Standard 754 for Binary Floating-Point Arithmetic. CS61C L14 MIPS Instruction Representation II (1) inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 16 Floating Point II 27-2-23 Lecturer SOE Dan Garcia As Pink Floyd crooned:

More information

Concurrency & Parallelism, 10 mi

Concurrency & Parallelism, 10 mi The Beauty and Joy of Computing Lecture #7 Concurrency Instructor : Sean Morris Quest (first exam) in 5 days!! In this room! Concurrency & Parallelism, 10 mi up Intra-computer Today s lecture Multiple

More information

1.13 Historical Perspectives and References

1.13 Historical Perspectives and References Case Studies and Exercises by Diana Franklin 61 Appendix H reviews VLIW hardware and software, which, in contrast, are less popular than when EPIC appeared on the scene just before the last edition. Appendix

More information

xx.yyyy Lecture #11 Floating Point II Summary (single precision): Precision and Accuracy Fractional Powers of 2 Representation of Fractions

xx.yyyy Lecture #11 Floating Point II Summary (single precision): Precision and Accuracy Fractional Powers of 2 Representation of Fractions CS61C L11 Floating Point II (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #11 Floating Point II 2007-7-12 Scott Beamer, Instructor Sony & Nintendo make E3 News www.nytimes.com Review

More information

The NoSQL Movement. FlockDB. CSCI 470: Web Science Keith Vertanen

The NoSQL Movement. FlockDB. CSCI 470: Web Science Keith Vertanen The NoSQL Movement FlockDB CSCI 470: Web Science Keith Vertanen 2 3 h2p://techcrunch.com/2012/10/27/big- data- right- now- five- trendy- open- source- technologies/ 4 h2p://blog.beany.co.kr/archives/275

More information

In the news. Request- Level Parallelism (RLP) Agenda 10/7/11

In the news. Request- Level Parallelism (RLP) Agenda 10/7/11 In the news "The cloud isn't this big fluffy area up in the sky," said Rich Miller, who edits Data Center Knowledge, a publicaeon that follows the data center industry. "It's buildings filled with servers

More information

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Performance Analysis Instructor: Haidar M. Harmanani Spring 2018 Outline Performance scalability Analytical performance measures Amdahl

More information

www-inst.eecs.berkeley.edu/~cs61c/

www-inst.eecs.berkeley.edu/~cs61c/ CS61C Machine Structures Lecture 34 - Caches II 11/16/2007 John Wawrzynek (www.cs.berkeley.edu/~johnw) www-inst.eecs.berkeley.edu/~cs61c/ 1 What to do on a write hit? Two Options: Write-through update

More information

Clever Signed Adder/Subtractor. Five Components of a Computer. The CPU. Stages of the Datapath (1/5) Stages of the Datapath : Overview

Clever Signed Adder/Subtractor. Five Components of a Computer. The CPU. Stages of the Datapath (1/5) Stages of the Datapath : Overview Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 24 Introduction to CPU design Hi to Vitaly Babiy from Albany, NY! 2008-03-21 Stanford researchers developing

More information

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia MapReduce Spark Some slides are adapted from those of Jeff Dean and Matei Zaharia What have we learnt so far? Distributed storage systems consistency semantics protocols for fault tolerance Paxos, Raft,

More information

Agenda. Request- Level Parallelism. Agenda. Anatomy of a Web Search. Google Query- Serving Architecture 9/20/10

Agenda. Request- Level Parallelism. Agenda. Anatomy of a Web Search. Google Query- Serving Architecture 9/20/10 Agenda CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaHerson hhp://inst.eecs.berkeley.edu/~cs61c/fa10 Request and Data Level Parallelism Administrivia

More information

The Beauty and Joy of Computing

The Beauty and Joy of Computing The Beauty and Joy of Computing Lecture #8 : Concurrency UC Berkeley Teaching Assistant Yaniv Rabbit Assaf Friendship Paradox On average, your friends are more popular than you. The average Facebook user

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

Fra superdatamaskiner til grafikkprosessorer og

Fra superdatamaskiner til grafikkprosessorer og Fra superdatamaskiner til grafikkprosessorer og Brødtekst maskinlæring Prof. Anne C. Elster IDI HPC/Lab Parallel Computing: Personal perspective 1980 s: Concurrent and Parallel Pascal 1986: Intel ipsc

More information

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA) Introduction to MapReduce Adapted from Jimmy Lin (U. Maryland, USA) Motivation Overview Need for handling big data New programming paradigm Review of functional programming mapreduce uses this abstraction

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures #16 Cal rolls over OSU Behind the arm of Nate Longshore s 341 yds passing & 4 TDs, the Bears roll 41-13. Recall they stopped our winning streak

More information

Brought to you by CalLUG (UC Berkeley GNU/Linux User Group). Tuesday, September 20, 6-8 PM in 100 GPB.

Brought to you by CalLUG (UC Berkeley GNU/Linux User Group). Tuesday, September 20, 6-8 PM in 100 GPB. Hate EMACS? Love EMACS? Richard M. Stallman, a famous proponent of opensource software, the founder of the GNU Project, and the author of emacs and gcc, will be giving a speech. We're working on securing

More information

Lecture #6 Intro MIPS; Load & Store Assembly Variables: Registers (1/4) Review. Unlike HLL like C or Java, assembly cannot use variables

Lecture #6 Intro MIPS; Load & Store Assembly Variables: Registers (1/4) Review. Unlike HLL like C or Java, assembly cannot use variables CS61C L6 Intro MIPS ; Load & Store (1) Hate EMACS? Love EMACS? Richard M. Stallman, a famous proponent of opensource software, the founder of the GNU Project, and the author of emacs and gcc, will be giving

More information

CT Parallel and Distributed Computing

CT Parallel and Distributed Computing CT5708701 Parallel and Distributed Computing Lecturer:Yo-Ming Hsieh Time: 09:30AM 12:20PM (Tue.) Course website: On Blackboard System http://elearning.ntust.edu.tw Class room: IB-505 Grading: 30 pts: Mid/Final

More information

High Performance Computing. Introduction to Parallel Computing

High Performance Computing. Introduction to Parallel Computing High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials

More information

Parallel Computing. Parallel Computing. Hwansoo Han

Parallel Computing. Parallel Computing. Hwansoo Han Parallel Computing Parallel Computing Hwansoo Han What is Parallel Computing? Software with multiple threads Parallel vs. concurrent Parallel computing executes multiple threads at the same time on multiple

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 39 Intra-machine Parallelism 2010-04-30!!!Head TA Scott Beamer!!!www.cs.berkeley.edu/~sbeamer Old-Fashioned Mud-Slinging with

More information

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing ECE5610/CSC6220 Introduction to Parallel and Distribution Computing Lecture 6: MapReduce in Parallel Computing 1 MapReduce: Simplified Data Processing Motivation Large-Scale Data Processing on Large Clusters

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds. Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum

More information

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program. Performance COMP375 Computer Architecture and dorganization What is Good Performance Which is the best performing jet? Airplane Passengers Range (mi) Speed (mph) Boeing 737-100 101 630 598 Boeing 747 470

More information