Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland,

Size: px
Start display at page:

Download "Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland,"

Transcription

1 Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland, 1

2 Today Answers to Questions How to estimate resources for large data projects - Some basic issues - Factors to take into account - Different methods to estimate - Example 2

3 FAQ How is the grade computed? 15% Mid-Term Exam 15% Final Exam 70% Project 3

4 FAQ What is graded in the project? Final Presentation Project Report (details to follow) 4

5 Some Basics 5

6 Some Basics Why can t we automatize the 5

7 Some Basics Why can t we automatize the estimation of resources for large 5

8 Some Basics Why can t we automatize the estimation of resources for large data projects? 5

9 Some Basics Why can t we automatize the estimation of resources for large data projects? Halting problem 5

10 Some Basics Why can t we automatize the estimation of resources for large data projects? Halting problem undecideable!! 5

11 Some Basics Why can t we automatize the estimation of resources for large data projects? Halting problem undecideable!! 5

12 More Basics 6

13 More Basics Why can t we just parallelize everything? 6

14 More Basics Why can t we just parallelize everything? Amdahl s Law! 6

15 More Basics Why can t we just parallelize everything? Amdahl s Law! Amdahl, Gene (1967). "Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities". AFIPS Conference Proceedings (30):

16 Amdahl s Law where: S is the speed-up expected P is the unparallelizable portion N is the number of CPUs 7

17 Amdahl s Law where: SU is the speed-up measured NP is the number of CPUs 8

18 Amdahl s Law 9

19 Amdahl s Law 10

20 Amdahl s Law Intuition: Unparallelizeable portion dominates runtime. 10

21 Amdahl s Law Intuition: Unparallelizeable portion dominates runtime. => Diminishing returns when adding more CPUs. 10

22 Factors to take into Account when estimating runtime 11

23 Factors to take into Account when estimating runtime CPU usage (number of instructions) 11

24 Factors to take into Account when estimating runtime CPU usage (number of instructions) I/O usage 11

25 Factors to take into Account when estimating runtime CPU usage (number of instructions) I/O usage Memory requirements 11

26 Factors to take into Account when estimating runtime CPU usage (number of instructions) I/O usage Memory requirements External factors 11

27 Factors to take into Account when estimating runtime CPU usage (number of instructions) I/O usage Memory requirements External factors The human factor 11

28 CPU Usage 12

29 CPU Usage How many instruction does the program need per byte of input data? 12

30 CPU Usage How many instruction does the program need per byte of input data? - Similar to O(n) notation of theoretical computer science 12

31 CPU Usage How many instruction does the program need per byte of input data? - Similar to O(n) notation of theoretical computer science - Problem: Machine learning algorithms converge! Runtime ~ Entropy of input 12

32 I/O Usage 13

33 I/O Usage I/O might be: - memory access - storage (HD) access - network access 13

34 I/O Usage I/O might be: - memory access - storage (HD) access - network access Orders of magnitude slower than CPU access! 13

35 I/O Usage Abraham Silberschatz, Greg Gagne, and Peter Baer Galvin, "Operating System Concepts, Eighth Edition ", Chapter 1 14

36 I/O Usage Abraham Silberschatz, Greg Gagne, and Peter Baer Galvin, "Operating System Concepts, Eighth Edition ", Chapter 1 15

37 I/O Usage Abraham Silberschatz, Greg Gagne, and Peter Baer Galvin, "Operating System Concepts, Eighth Edition ", Chapter 1 16

38 I/O Usage 17

39 I/O Usage How do you store and access the central data input? 17

40 I/O Usage How do you store and access the central data input? How do you access and store intermediate files? 17

41 I/O Usage How do you store and access the central data input? How do you access and store intermediate files? How do you write out (debugging) output. 17

42 I/O Usage How do you store and access the central data input? How do you access and store intermediate files? How do you write out (debugging) output. How many I/O operations will it be? 17

43 I/O Usage How do you store and access the central data input? How do you access and store intermediate files? How do you write out (debugging) output. How many I/O operations will it be? Using which device latencies? 17

44 Memory Requirements 18

45 Memory Requirements How can the cache be used most efficiently? 18

46 Memory Requirements How can the cache be used most efficiently? How much memory is available per CPU? 18

47 Memory Requirements How can the cache be used most efficiently? How much memory is available per CPU? Is the memory actually physical or virtual? Avoid Thrashing! 18

48 External Factors 19

49 External Factors How many CPUs are available at each time? 19

50 External Factors How many CPUs are available at each time? Are the CPUs of the same type? 19

51 External Factors How many CPUs are available at each time? Are the CPUs of the same type? Are special purpose processors available (GPUs)? 19

52 External Factors How many CPUs are available at each time? Are the CPUs of the same type? Are special purpose processors available (GPUs)? How expensive is data transfer? 19

53 External Factors How many CPUs are available at each time? Are the CPUs of the same type? Are special purpose processors available (GPUs)? How expensive is data transfer? Is the system functioning realiably? 19

54 Human Factors 20

55 Human Factors Do you still need to experiment? 20

56 Human Factors Do you still need to experiment? Are there possibly bugs? 20

57 Human Factors Do you still need to experiment? Are there possibly bugs? Can the algorithm be continued? 20

58 Human Factors Do you still need to experiment? Are there possibly bugs? Can the algorithm be continued? Do I need to supervize the execution? 20

59 Human Factors Do you still need to experiment? Are there possibly bugs? Can the algorithm be continued? Do I need to supervize the execution? When does my batch finish? Do I need to logout in the middle of the night? 20

60 Manual Estimation 21

61 Manual Estimation Check loops and nested loops: for, while 21

62 Manual Estimation Check loops and nested loops: for, while Nesting over input determines order of magnitude! 21

63 Manual Estimation Check loops and nested loops: for, while Nesting over input determines order of magnitude! Find memory allocations: new(), malloc(),+ 21

64 Manual Estimation Check loops and nested loops: for, while Nesting over input determines order of magnitude! Find memory allocations: new(), malloc(),+ Find I/O commands: read(), write() 21

65 Problem String readfile(java.io.filereader filereader) throws java.io.ioexception { java.io.bufferedreader br=new java.io.bufferedreader(filereader); String s= ; String line; while (null!=(line=br.readline())) s+=line+ \n ; return s; } 22

66 Problem Manual estimation hard and often counterintuitive! Example: String readfile(java.io.filereader filereader) throws java.io.ioexception { java.io.bufferedreader br=new java.io.bufferedreader(filereader); String s= ; String line; while (null!=(line=br.readline())) s+=line+ \n ; return s; } 22

67 Problem 23

68 Problem Reading in the ASCII Version of Moby Dick with lines of code and 69 char per line (1.1 MB, not BIGDATA)... 23

69 Problem Reading in the ASCII Version of Moby Dick with lines of code and 69 char per line (1.1 MB, not BIGDATA)......shuffles around Gigabytes! 23

70 Formal Methods 24

71 Formal Methods Similarity templates 24

72 Formal Methods Similarity templates Rough Set-based methods 24

73 Formal Methods Similarity templates Rough Set-based methods Queueing theory 24

74 Formal Methods Similarity templates Rough Set-based methods Queueing theory Autotuning 24

75 Formal Methods Similarity templates Rough Set-based methods Queueing theory Autotuning Problem: Usually limited due to assumptions, might err as well... 24

76 Inductive Algorithm Design 25

77 Inductive Algorithm Design Show algorithm works on small set 25

78 Inductive Algorithm Design Show algorithm works on small set Show error doesn t get worse when adding more data 25

79 Inductive Algorithm Design Show algorithm works on small set Show error doesn t get worse when adding more data Deduct that it will work for arbitratry data sizes 25

80 Problem: Non-Linear Scale PARDORA$Query$Performance$ 6.0# 5.0#!me$(seconds)$ 4.0# 3.0# 2.0# 1.0# 0.0# 0# 500# 1000# 1500# 2000# 2500# Number$of$songs$ Total#Time# 26

81 Optimization: General Approach 27

82 Optimization: General Approach Use spiral development scheme: 27

83 Optimization: General Approach Use spiral development scheme: Estimate computational needs on smaller scale. 27

84 Optimization: General Approach Use spiral development scheme: Estimate computational needs on smaller scale. Verify empirically. 27

85 Optimization: General Approach Use spiral development scheme: Estimate computational needs on smaller scale. Verify empirically. If verification OK, go larger scale and repeat 27

86 Optimization: General Approach Use spiral development scheme: Estimate computational needs on smaller scale. Verify empirically. If verification OK, go larger scale and repeat If not: debug estimation! 27

87 How to debug estimation 28

88 How to debug estimation Find the bottlenecks 28

89 How to debug estimation Find the bottlenecks Prioritize worst bottleneck first 28

90 How to debug estimation Find the bottlenecks Prioritize worst bottleneck first Problem: Diminishing Returns! 28

91 Fighting Bottlenecks 29

92 Fighting Bottlenecks Is there a faster algorithm? 29

93 Fighting Bottlenecks Is there a faster algorithm? Would an approximation do it? 29

94 Fighting Bottlenecks Is there a faster algorithm? Would an approximation do it? Can we do a fast match? 29

95 Fighting Bottlenecks Is there a faster algorithm? Would an approximation do it? Can we do a fast match? Can we parallelize? 29

96 Fighting Bottlenecks Is there a faster algorithm? Would an approximation do it? Can we do a fast match? Can we parallelize? Can we parallelize using specialized hardware? 29

97 Example: Iterative Optimization of ICSI Diarization System 30

98 Example: Iterative Optimization of ICSI Diarization System General Problem: 30

99 Example: Iterative Optimization of ICSI Diarization System General Problem: Rewriting from a base of highly optimized serial code (~10k lines for the core algorithm). 30

100 Example: Iterative Optimization of ICSI Diarization System General Problem: Rewriting from a base of highly optimized serial code (~10k lines for the core algorithm). Current Technical Issue: 30

101 Example: Iterative Optimization of ICSI Diarization System General Problem: Rewriting from a base of highly optimized serial code (~10k lines for the core algorithm). Current Technical Issue: CPU parallelism limited and coarse but easier to implement 30

102 Example: Iterative Optimization of ICSI Diarization System General Problem: Rewriting from a base of highly optimized serial code (~10k lines for the core algorithm). Current Technical Issue: CPU parallelism limited and coarse but easier to implement GPU parallelism finer but harder to implement 30

103 Example: Iterative Optimization of ICSI Diarization System General Problem: Rewriting from a base of highly optimized serial code (~10k lines for the core algorithm). Current Technical Issue: CPU parallelism limited and coarse but easier to implement GPU parallelism finer but harder to implement Current Solution: Hybrid Approach: 30

104 Example: Iterative Optimization of ICSI Diarization System General Problem: Rewriting from a base of highly optimized serial code (~10k lines for the core algorithm). Current Technical Issue: CPU parallelism limited and coarse but easier to implement GPU parallelism finer but harder to implement Current Solution: Hybrid Approach: CPU parallelism per cluster 30

105 Example: Iterative Optimization of ICSI Diarization System General Problem: Rewriting from a base of highly optimized serial code (~10k lines for the core algorithm). Current Technical Issue: CPU parallelism limited and coarse but easier to implement GPU parallelism finer but harder to implement Current Solution: Hybrid Approach: CPU parallelism per cluster GPU parallelism per frame 30

106 Example: Iterative Optimization of ICSI Diarization System General Problem: Rewriting from a base of highly optimized serial code (~10k lines for the core algorithm). Current Technical Issue: CPU parallelism limited and coarse but easier to implement GPU parallelism finer but harder to implement Current Solution: Hybrid Approach: CPU parallelism per cluster GPU parallelism per frame Process: Bottleneck by Bottleneck... 30

107 Speaker Diarization Overview BERKELEY PAR LAB Processing Chain: Audio Signal Dynamic Range Compression Wiener Filtering Beamforming Audio Audio Delay Features Short-Term Feature Extraction Long-Term Feature Extraction Prosodics (only speech) Segmentation MFCC Prosodics (only Speech) Initial Segments Diarization Engine "who spoke when" Speech/Non- Speech Detector EM Clustering Clustering MFCC (only Speech) Overall Structure: Pipe & Filter 31

108 Speaker Diarization Overview BERKELEY PAR LAB Processing Chain: Audio Signal Dynamic Range Compression Wiener Filtering Beamforming Audio Audio Delay Features Short-Term Feature Extraction Long-Term Feature Extraction Prosodics (only speech) Segmentation MFCC Prosodics (only Speech) Initial Segments Diarization Engine "who spoke when" Speech/Non- Speech Detector EM Clustering Clustering MFCC (only Speech) Overall Structure: Pipe & Filter 31

109 Algorithm outline: Speaker Diarization: Core Algorithm BERKELEY PAR LAB Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster2 Cluster1 Cluster2 Cluster3 Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

110 Speaker Diarization: Core Algorithm BERKELEY PAR LAB Algorithm outline: Initialization Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

111 Speaker Diarization: Core Algorithm BERKELEY PAR LAB Algorithm outline: Initialization Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

112 Speaker Diarization: Core Algorithm BERKELEY PAR LAB Algorithm outline: Initialization Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 (Re-)Training Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

113 Speaker Diarization: Core Algorithm BERKELEY PAR LAB Algorithm outline: Initialization Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 (Re-)Training Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

114 Speaker Diarization: Core Algorithm BERKELEY PAR LAB Algorithm outline: Initialization Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 (Re-)Training (Re-)Alignment Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

115 Algorithm outline: Speaker Diarization: Core Algorithm Initialization BERKELEY PAR LAB Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 (Re-)Training (Re-)Alignment Cluster2 Cluster1 Cluster2 Cluster3 Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

116 Algorithm outline: Speaker Diarization: Core Algorithm Initialization BERKELEY PAR LAB Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 (Re-)Training Yes Merge two Clusters? (Re-)Alignment Cluster2 Cluster1 Cluster2 Cluster3 Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

117 Algorithm outline: Speaker Diarization: Core Algorithm Initialization BERKELEY PAR LAB Cluster2 Cluster1 Cluster2 Cluster2 (Re-)Training Yes Merge two Clusters? (Re-)Alignment Cluster2 Cluster1 Cluster2 Cluster3 Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

118 Algorithm outline: Speaker Diarization: Core Algorithm Initialization BERKELEY PAR LAB Cluster2 Cluster1 Cluster2 Cluster2 (Re-)Training Yes Merge two Clusters? (Re-)Alignment Cluster2 Cluster1 Cluster2 Cluster3 Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

119 Algorithm outline: Speaker Diarization: Core Algorithm Initialization BERKELEY PAR LAB Cluster2 Cluster1 Cluster2 Cluster2 (Re-)Training Yes Merge two Clusters? (Re-)Alignment Cluster1 Cluster2 Cluster1 Cluster2 Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

120 Algorithm outline: Speaker Diarization: Core Algorithm Initialization BERKELEY PAR LAB Cluster2 Cluster1 Cluster2 Cluster2 (Re-)Training Yes Merge two Clusters? No End (Re-)Alignment Cluster1 Cluster2 Cluster1 Cluster2 Start with too many clusters (initialized randomly) Purify clusters by comparing and merging similar clusters Resegment and repeat until no more merging needed 32

121 CPU Parallelization I BERKELEY PAR LAB Easiest: Different Streams = Different CPUs Delay Features Prosodics (only speech) MFCC (only Speech) Segmentation Diarization Engine Clustering "who spoke when" Parallelization: Each Feature Stream Processed in a Different CPU Thread (MapReduce) 33

122 CPU Parallelization I BERKELEY PAR LAB Easiest: Different Streams = Different CPUs Delay Features Prosodics (only speech) MFCC (only Speech) Segmentation Segmentation Diarization Engine Segmentation Diarization Engine Diarization Clustering Engine Clustering Clustering "who spoke when" Parallelization: Each Feature Stream Processed in a Different CPU Thread (MapReduce) 33

123 CPU Parallelization II New Bottleneck: Comparison of clusters in each iteration BERKELEY PAR LAB Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5... Cluster n ( n 2) Comparisons Bayesian Information Criterion e.g. Cluster 1 Cluster 5 Merging Decision Parallelization: Each pair of comparison to a different CPU thread (MapReduce with irregular data access) 34

124 CPU Parallelization II New Bottleneck: Comparison of clusters in each iteration BERKELEY PAR LAB Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5... Cluster n ( n 2) Comparisons Bayesian Information Criterion CPU Thread CPU Thread ( n 2) e.g. Cluster 1 Cluster 5 Merging Decision Parallelization: Each pair of comparison to a different CPU thread (MapReduce with irregular data access) 34

125 CPU Parallelization II: Result BERKELEY PAR LAB Multicore Speaker Diarization -- Skype Call No Fastmatch Fastmatch Serial optimization more efficient than parallelization! 35

126 CPU Parallelization II BERKELEY PAR LAB New Bottleneck: Training Models Initialization (Re-)Training (Re-)Alignment Yes Merge two Clusters? No End Parallelization: Each Cluster Computed on a Different Core (MapReduce of dense linear algebra operations) 36

127 CPU Parallelization II BERKELEY PAR LAB New Bottleneck: Training Models Initialization (Re-)Training (Re-)Alignment Yes Merge two Clusters? No End Parallelization: Each Cluster Computed on a Different Core (MapReduce of dense linear algebra operations) 36

128 Diarization CPU Parallelization BERKELEY PAR LAB Runtime/Core xrt Cores Parallelized Portion of the Runtime: ~62% 37

129 Diarization GPU Parallelization BERKELEY PAR LAB New Bottleneck: Computation of Log-Likelihoods (Initially 28% of runtime, now 60% of runtime) Initialization (Re-)Training (Re-)Alignment Yes Merge two Clusters? No End Parallelization: Use CUDA for Frame-Level Parallelization. 38

130 Diarization GPU Parallelization BERKELEY PAR LAB New Bottleneck: Computation of Log-Likelihoods (Initially 28% of runtime, now 60% of runtime) Initialization (Re-)Training (Re-)Alignment Yes Merge two Clusters? No End Parallelization: Use CUDA for Frame-Level Parallelization. 38

131 Diarization GPU Parallelization BERKELEY PAR LAB F1 F2 F3 F4 F5 F6 Fi... Fn F1 F2 F3 F4 F5 F6 Fi... Fn Chunk1 Chunkn/maxthreads Fi=Threadi GMM 1... GMM k GMM 2 CUDA Memory... ll1 ll2 ll3 ll4 ll5 ll6 lli... lln ll1 ll2 ll3 ll4 ll5 ll6 lli... lln Result: Total Speaker Diarization speed-up 4.8 -> 0.07xRT t 39

132 Next Week (Project Meeting) Jaeyoung Choi on MediaEval 2012

133 Next Week (Lecture) Mid-Term Exam! 41

Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland,

Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland, Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland, fractor@icsi.berkeley.edu 1 Today Recap: Some more Machine Learning Multimedia Systems An example Multimedia

More information

3. CPU Scheduling. Operating System Concepts with Java 8th Edition Silberschatz, Galvin and Gagn

3. CPU Scheduling. Operating System Concepts with Java 8th Edition Silberschatz, Galvin and Gagn 3. CPU Scheduling Operating System Concepts with Java 8th Edition Silberschatz, Galvin and Gagn S P O I L E R operating system CPU Scheduling 3 operating system CPU Scheduling 4 Long-short-medium Scheduler

More information

Implementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi

Implementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi Implementing a Speech Recognition System on a GPU using CUDA Presented by Omid Talakoub Astrid Yi Outline Background Motivation Speech recognition algorithm Implementation steps GPU implementation strategies

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Seventh Edition By William Stallings Course Outline & Marks Distribution Hardware Before mid Memory After mid Linux

More information

CSCI-1200 Data Structures Fall 2017 Lecture 7 Order Notation & Basic Recursion

CSCI-1200 Data Structures Fall 2017 Lecture 7 Order Notation & Basic Recursion CSCI-1200 Data Structures Fall 2017 Lecture 7 Order Notation & Basic Recursion Announcements: Test 1 Information Test 1 will be held Monday, Sept 25th, 2017 from 6-7:50pm Students will be randomly assigned

More information

Chapter 1: Introduction Dr. Ali Fanian. Operating System Concepts 9 th Edit9on

Chapter 1: Introduction Dr. Ali Fanian. Operating System Concepts 9 th Edit9on Chapter 1: Introduction Dr. Ali Fanian Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 1.2 Silberschatz, Galvin and Gagne 2013 Organization Lectures Homework Quiz Several homeworks

More information

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in

More information

CSCI-1200 Data Structures Spring 2018 Lecture 7 Order Notation & Basic Recursion

CSCI-1200 Data Structures Spring 2018 Lecture 7 Order Notation & Basic Recursion CSCI-1200 Data Structures Spring 2018 Lecture 7 Order Notation & Basic Recursion Review from Lectures 5 & 6 Arrays and pointers, Pointer arithmetic and dereferencing, Types of memory ( automatic, static,

More information

Operating Systems. P. Healy. Spring CS1-08 Computer Science Bldg. tel: Administrive Details Introduction

Operating Systems. P. Healy. Spring CS1-08 Computer Science Bldg. tel: Administrive Details Introduction Operating Systems P. Healy CS1-08 Computer Science Bldg. tel: 202727 patrick.healy@ul.ie Spring 2009-2010 Outline 1 2 Outline 1 2 Lectures / Labs Lecture Hours: Wed. 09h00, CSG25 Thu. 16h00 KBG13 Lab Tue.

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Hardware Address Binding Memory Allocation References. Physical Memory. Daniel Bosk 1

Hardware Address Binding Memory Allocation References. Physical Memory. Daniel Bosk 1 Physical Memory Daniel Bosk 1 Department of Information and Communication Systems (ICS), Mid Sweden University, Sundsvall. physmem.tex 233 2018-02-18 21:20:05Z jimahl 1 1 This work is licensed under the

More information

Readings and References. Virtual Memory. Virtual Memory. Virtual Memory VPN. Reading. CSE Computer Systems December 5, 2001.

Readings and References. Virtual Memory. Virtual Memory. Virtual Memory VPN. Reading. CSE Computer Systems December 5, 2001. Readings and References Virtual Memory Reading Chapter through.., Operating System Concepts, Silberschatz, Galvin, and Gagne CSE - Computer Systems December, Other References Chapter, Inside Microsoft

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

Bootstrap, Memory Management and Troubleshooting. LS 12, TU Dortmund

Bootstrap, Memory Management and Troubleshooting. LS 12, TU Dortmund Bootstrap, Memory Management and Troubleshooting (slides are based on Prof. Dr. Jian-Jia Chen and http://www.freertos.org) Anas Toma LS 12, TU Dortmund February 01, 2018 Anas Toma (LS 12, TU Dortmund)

More information

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I EE382 (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Spring 2015

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally

More information

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large

More information

Parallel Constraint Programming (and why it is hard... ) Ciaran McCreesh and Patrick Prosser

Parallel Constraint Programming (and why it is hard... ) Ciaran McCreesh and Patrick Prosser Parallel Constraint Programming (and why it is hard... ) This Week s Lectures Search and Discrepancies Parallel Constraint Programming Why? Some failed attempts A little bit of theory and some very simple

More information

Chapter 20: Multimedia Systems

Chapter 20: Multimedia Systems Chapter 20: Multimedia Systems, Silberschatz, Galvin and Gagne 2009 Chapter 20: Multimedia Systems What is Multimedia? Compression Requirements of Multimedia Kernels CPU Scheduling Disk Scheduling Network

More information

Chapter 20: Multimedia Systems. Operating System Concepts 8 th Edition,

Chapter 20: Multimedia Systems. Operating System Concepts 8 th Edition, Chapter 20: Multimedia Systems, Silberschatz, Galvin and Gagne 2009 Chapter 20: Multimedia Systems What is Multimedia? Compression Requirements of Multimedia Kernels CPU Scheduling Disk Scheduling Network

More information

Multimedia Event Detection for Large Scale Video. Benjamin Elizalde

Multimedia Event Detection for Large Scale Video. Benjamin Elizalde Multimedia Event Detection for Large Scale Video Benjamin Elizalde Outline Motivation TrecVID task Related work Our approach (System, TF/IDF) Results & Processing time Conclusion & Future work Agenda 2

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 40 Hardware Parallel Computing 2006-12-06 Thanks to John Lazarro for his CS152 slides inst.eecs.berkeley.edu/~cs152/ Head TA

More information

Outline. Lecture 40 Hardware Parallel Computing Thanks to John Lazarro for his CS152 slides inst.eecs.berkeley.

Outline. Lecture 40 Hardware Parallel Computing Thanks to John Lazarro for his CS152 slides inst.eecs.berkeley. CS61C L40 Hardware Parallel Computing (1) inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 40 Hardware Parallel Computing 2006-12-06 Thanks to John Lazarro for his CS152 slides

More information

Parallelizing Speaker-Attributed Speech Recognition for Meeting Browsing

Parallelizing Speaker-Attributed Speech Recognition for Meeting Browsing 2010 IEEE International Symposium on Multimedia Parallelizing Speaker-Attributed Speech Recognition for Meeting Browsing Gerald Friedland, Jike Chong, Adam Janin International Computer Science Institute

More information

Accelerated Machine Learning Algorithms in Python

Accelerated Machine Learning Algorithms in Python Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals

More information

5. Synchronization. Operating System Concepts with Java 8th Edition Silberschatz, Galvin and Gagn

5. Synchronization. Operating System Concepts with Java 8th Edition Silberschatz, Galvin and Gagn 5. Synchronization Operating System Concepts with Java 8th Edition Silberschatz, Galvin and Gagn operating system Synchronization 3 Review - thread on single core Process 1 p1 threads Process N threads

More information

Kampala August, Agner Fog

Kampala August, Agner Fog Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler

More information

Chapter 18 - Multicore Computers

Chapter 18 - Multicore Computers Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28 Table of Contents I 1 2 Where to focus your study Luis Tarrataca

More information

Ade Miller Senior Development Manager Microsoft patterns & practices

Ade Miller Senior Development Manager Microsoft patterns & practices Ade Miller (adem@microsoft.com) Senior Development Manager Microsoft patterns & practices Save time and reduce risk on your software development projects by incorporating patterns & practices, Microsoft's

More information

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto Ricardo Rocha Department of Computer Science Faculty of Sciences University of Porto Slides based on the book Operating System Concepts, 9th Edition, Abraham Silberschatz, Peter B. Galvin and Greg Gagne,

More information

Lecture #10 Context Switching & Performance Optimization

Lecture #10 Context Switching & Performance Optimization SPRING 2015 Integrated Technical Education Cluster At AlAmeeria E-626-A Real-Time Embedded Systems (RTES) Lecture #10 Context Switching & Performance Optimization Instructor: Dr. Ahmad El-Banna Agenda

More information

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto Ricardo Rocha Department of Computer Science Faculty of Sciences University of Porto Slides based on the book Operating System Concepts, 9th Edition, Abraham Silberschatz, Peter B. Galvin and Greg Gagne,

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading

More information

Operating Systems. Lecture 09: Input/Output Management. Elvis C. Foster

Operating Systems. Lecture 09: Input/Output Management. Elvis C. Foster Operating Systems 141 Lecture 09: Input/Output Management Despite all the considerations that have discussed so far, the work of an operating system can be summarized in two main activities input/output

More information

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion

More information

Algorithms in Systems Engineering IE172. Midterm Review. Dr. Ted Ralphs

Algorithms in Systems Engineering IE172. Midterm Review. Dr. Ted Ralphs Algorithms in Systems Engineering IE172 Midterm Review Dr. Ted Ralphs IE172 Midterm Review 1 Textbook Sections Covered on Midterm Chapters 1-5 IE172 Review: Algorithms and Programming 2 Introduction to

More information

The Future of High Performance Computing

The Future of High Performance Computing The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer

More information

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [MEMORY MANAGEMENT] Matrices in Banker s algorithm Max, need, allocated Shrideep Pallickara Computer Science Colorado

More information

COS 318: Operating Systems. Introduction

COS 318: Operating Systems. Introduction COS 318: Operating Systems Introduction Kai Li Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cs318/) Today Course staff and logistics What is operating system? Evolution

More information

CSE506: Operating Systems CSE 506: Operating Systems

CSE506: Operating Systems CSE 506: Operating Systems CSE 506: Operating Systems Introduction Today s Lecture Course Overview Course Topics Grading Logistics Academic Integrity Policy Key concepts from Undergrad Operating Systems Course Overview (1/3) Caveat

More information

CSE506: Operating Systems CSE 506: Operating Systems

CSE506: Operating Systems CSE 506: Operating Systems CSE 506: Operating Systems Introduction Today s Lecture Course Overview Course Topics Grading Logistics Academic Integrity Policy Homework #1 (of 1) Quiz #1 (of 1) Key concepts from Undergrad Operating

More information

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin

More information

Lecture 1: Gentle Introduction to GPUs

Lecture 1: Gentle Introduction to GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed

More information

3. Memory Management

3. Memory Management Principles of Operating Systems CS 446/646 3. Memory Management René Doursat Department of Computer Science & Engineering University of Nevada, Reno Spring 2006 Principles of Operating Systems CS 446/646

More information

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 370: SYSTEM ARCHITECTURE & SOFTWARE [FILE SYSTEMS] Interpretation of metdata from different file systems Error Correction on hard disks? Shrideep

More information

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27 1 / 27 GPU Programming Lecture 1: Introduction Miaoqing Huang University of Arkansas 2 / 27 Outline Course Introduction GPUs as Parallel Computers Trend and Design Philosophies Programming and Execution

More information

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Janis Keuper Itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern,

More information

CS 475: Parallel Programming Introduction

CS 475: Parallel Programming Introduction CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.

More information

A Framework for Productive, Efficient and Portable Parallel Computing

A Framework for Productive, Efficient and Portable Parallel Computing A Framework for Productive, Efficient and Portable Parallel Computing Ekaterina I. Gonina Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2013-216

More information

How to Write Fast Code , spring th Lecture, Mar. 31 st

How to Write Fast Code , spring th Lecture, Mar. 31 st How to Write Fast Code 18-645, spring 2008 20 th Lecture, Mar. 31 st Instructor: Markus Püschel TAs: Srinivas Chellappa (Vas) and Frédéric de Mesmay (Fred) Introduction Parallelism: definition Carrying

More information

Chris Riesbeck, Fall Introduction to Computer Systems

Chris Riesbeck, Fall Introduction to Computer Systems Chris Riesbeck, Fall 2011 Introduction to Computer Systems Welcome to Intro. to Computer Systems Everything you need to know http://www.cs.northwestern.edu/academics/courses/213/ Instructor: Chris Riesbeck

More information

Rsyslog: going up from 40K messages per second to 250K. Rainer Gerhards

Rsyslog: going up from 40K messages per second to 250K. Rainer Gerhards Rsyslog: going up from 40K messages per second to 250K Rainer Gerhards What's in it for you? Bad news: will not teach you to make your kernel component five times faster Perspective user-space application

More information

Input/Output Management

Input/Output Management Chapter 11 Input/Output Management This could be the messiest aspect of an operating system. There are just too much stuff involved, it is difficult to develop a uniform and consistent theory to cover

More information

Searching for Meaning in the Era of Big Data and IoT

Searching for Meaning in the Era of Big Data and IoT Searching for Meaning in the Era of Big Data and IoT Trung Tran MIT Lincoln Labs GraphEx Conference 11 May 2016 Distribution Statement A MTO Strategy EM Spectrum Tactical Information Extraction Globalization

More information

CS162 Operating Systems and Systems Programming Lecture 12. Address Translation. Page 1

CS162 Operating Systems and Systems Programming Lecture 12. Address Translation. Page 1 CS162 Operating Systems and Systems Programming Lecture 12 Translation March 10, 2008 Prof. Anthony D. Joseph http://inst.eecs.berkeley.edu/~cs162 Review: Important Aspects of Memory Multiplexing Controlled

More information

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

CS/CoE 1541 Final exam (Fall 2017). This is the cumulative final exam given in the Fall of Question 1 (12 points): was on Chapter 4

CS/CoE 1541 Final exam (Fall 2017). This is the cumulative final exam given in the Fall of Question 1 (12 points): was on Chapter 4 CS/CoE 1541 Final exam (Fall 2017). Name: This is the cumulative final exam given in the Fall of 2017. Question 1 (12 points): was on Chapter 4 Question 2 (13 points): was on Chapter 4 For Exam 2, you

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance 13.2 Silberschatz, Galvin

More information

Principles of Data Management. Lecture #9 (Query Processing Overview)

Principles of Data Management. Lecture #9 (Query Processing Overview) Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm

More information

Contour Detection on Mobile Platforms

Contour Detection on Mobile Platforms Contour Detection on Mobile Platforms Bor-Yiing Su, subrian@eecs.berkeley.edu Prof. Kurt Keutzer, keutzer@eecs.berkeley.edu Parallel Computing Lab, University of California, Berkeley 1/26 Diagnosing Power/Performance

More information

So far. Next: scheduling next process from Wait to Run. 1/31/08 CSE 30341: Operating Systems Principles

So far. Next: scheduling next process from Wait to Run. 1/31/08 CSE 30341: Operating Systems Principles So far. Firmware identifies hardware devices present OS bootstrap process: uses the list created by firmware and loads driver modules for each detected hardware. Initializes internal data structures (PCB,

More information

Introduction to Computer Systems

Introduction to Computer Systems Introduction to Computer Systems Today: Welcome to EECS 213 Lecture topics and assignments Next time: Bits & bytes and some Boolean algebra Fabián E. Bustamante, Spring 2010 Welcome to Intro. to Computer

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

Introduction to Computer Systems

Introduction to Computer Systems Introduction to Computer Systems Today:! Welcome to EECS 213! Lecture topics and assignments Next time:! Bits & bytes! and some Boolean algebra Fabián E. Bustamante, 2007 Welcome to Intro. to Computer

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

PARALLEL TRAINING OF NEURAL NETWORKS FOR SPEECH RECOGNITION

PARALLEL TRAINING OF NEURAL NETWORKS FOR SPEECH RECOGNITION PARALLEL TRAINING OF NEURAL NETWORKS FOR SPEECH RECOGNITION Stanislav Kontár Speech@FIT, Dept. of Computer Graphics and Multimedia, FIT, BUT, Brno, Czech Republic E-mail: xkonta00@stud.fit.vutbr.cz In

More information

CSE 4/521 Introduction to Operating Systems. Lecture 14 Main Memory III (Paging, Structure of Page Table) Summer 2018

CSE 4/521 Introduction to Operating Systems. Lecture 14 Main Memory III (Paging, Structure of Page Table) Summer 2018 CSE 4/521 Introduction to Operating Systems Lecture 14 Main Memory III (Paging, Structure of Page Table) Summer 2018 Overview Objective: To discuss how paging works in contemporary computer systems. Paging

More information

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Fall 2011 --

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Chapter 1: Introduction. Operating System Concepts 9 th Edit9on

Chapter 1: Introduction. Operating System Concepts 9 th Edit9on Chapter 1: Introduction Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Objectives To describe the basic organization of computer systems To provide a grand tour of the major

More information

PRACE Autumn School Basic Programming Models

PRACE Autumn School Basic Programming Models PRACE Autumn School 2010 Basic Programming Models Basic Programming Models - Outline Introduction Key concepts Architectures Programming models Programming languages Compilers Operating system & libraries

More information

Linear Regression Optimization

Linear Regression Optimization Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:

More information

Fundamental Optimizations in CUDA Peng Wang, Developer Technology, NVIDIA

Fundamental Optimizations in CUDA Peng Wang, Developer Technology, NVIDIA Fundamental Optimizations in CUDA Peng Wang, Developer Technology, NVIDIA Optimization Overview GPU architecture Kernel optimization Memory optimization Latency optimization Instruction optimization CPU-GPU

More information

CSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis. Kevin Quinn Fall 2015

CSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis. Kevin Quinn Fall 2015 CSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis Kevin Quinn Fall 2015 Outline Done: How to write a parallel algorithm with fork and join Why using divide-and-conquer

More information

CS342 Operating Systems

CS342 Operating Systems Bilkent University Department of Computer Engineering CS342 Operating Systems CS342 Operating Systems Dr. İbrahim Körpeoğlu http://www.cs.bilkent.edu.tr/~korpe 1 About the Course Will teach operating systems

More information

Fast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training

Fast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training Fast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training Ekaterina Gonina Electrical Engineering and Computer Sciences University of California at Berkeley Technical

More information

18-447: Computer Architecture Lecture 16: Virtual Memory

18-447: Computer Architecture Lecture 16: Virtual Memory 18-447: Computer Architecture Lecture 16: Virtual Memory Justin Meza Carnegie Mellon University (with material from Onur Mutlu, Michael Papamichael, and Vivek Seshadri) 1 Notes HW 2 and Lab 2 grades will

More information

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems CS 318 Principles of Operating Systems Fall 2017 Lecture 16: File Systems Examples Ryan Huang File Systems Examples BSD Fast File System (FFS) - What were the problems with the original Unix FS? - How

More information

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs Dan Grossman Last Updated: January 2016 For more information, see http://www.cs.washington.edu/homes/djg/teachingmaterials/

More information

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of California, Berkeley Operating Systems Principles

More information

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems CS 318 Principles of Operating Systems Fall 2018 Lecture 16: Advanced File Systems Ryan Huang Slides adapted from Andrea Arpaci-Dusseau s lecture 11/6/18 CS 318 Lecture 16 Advanced File Systems 2 11/6/18

More information

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on

More information

GstShark profiling: a real-life example. Michael Grüner - David Soto -

GstShark profiling: a real-life example. Michael Grüner - David Soto - GstShark profiling: a real-life example Michael Grüner - michael.gruner@ridgerun.com David Soto - david.soto@ridgerun.com Introduction Michael Grüner Technical Lead at RidgeRun Digital signal processing

More information

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College Computer Systems C S 0 7 Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College 2 Today s Topics TODAY S LECTURE: Caching ANNOUNCEMENTS: Assign6 & Assign7 due Friday! 6 & 7 NO late

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language

More information

Memory Management 9th Week

Memory Management 9th Week Department of Electrical Engineering and Information Technology Faculty of Engineering Universitas Gadjah Mada, Indonesia Operating System - TIF 206 Memory Management 9th Week Sunu Wibirama Copyright 2011

More information

Optimizing feature representation for speaker diarization using PCA and LDA

Optimizing feature representation for speaker diarization using PCA and LDA Optimizing feature representation for speaker diarization using PCA and LDA itsikv@netvision.net.il Jean-Francois Bonastre jean-francois.bonastre@univ-avignon.fr Outline Speaker Diarization what is it?

More information

Hardware Acceleration for Tagging

Hardware Acceleration for Tagging Parallel Hardware Parallel Applications IT industry (Silicon Valley) Parallel Software Users Hardware Acceleration for Tagging Sarah Bird, David McGrogan, John Kubiatowicz, Krste Asanovic June 5, 2008

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

CSE 333 Lecture 1 - Systems programming

CSE 333 Lecture 1 - Systems programming CSE 333 Lecture 1 - Systems programming Hal Perkins Department of Computer Science & Engineering University of Washington Welcome! Today s goals: - introductions - big picture - course syllabus - setting

More information

CS 4349 Lecture October 18th, 2017

CS 4349 Lecture October 18th, 2017 CS 4349 Lecture October 18th, 2017 Main topics for #lecture include #minimum_spanning_trees. Prelude Homework 6 due today. Homework 7 due Wednesday, October 25th. Homework 7 has one normal homework problem.

More information

FAWN as a Service. 1 Introduction. Jintian Liang CS244B December 13, 2017

FAWN as a Service. 1 Introduction. Jintian Liang CS244B December 13, 2017 Liang 1 Jintian Liang CS244B December 13, 2017 1 Introduction FAWN as a Service FAWN, an acronym for Fast Array of Wimpy Nodes, is a distributed cluster of inexpensive nodes designed to give users a view

More information

Chapter 13 Strong Scaling

Chapter 13 Strong Scaling Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

0.1 Welcome. 0.2 Insertion sort. Jessica Su (some portions copied from CLRS)

0.1 Welcome. 0.2 Insertion sort. Jessica Su (some portions copied from CLRS) 0.1 Welcome http://cs161.stanford.edu My contact info: Jessica Su, jtysu at stanford dot edu, office hours Monday 3-5 pm in Huang basement TA office hours: Monday, Tuesday, Wednesday 7-9 pm in Huang basement

More information

Test on Wednesday! Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs)

Test on Wednesday! Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs) Test on Wednesday! 50 minutes Closed notes, closed computer, closed everything Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs) Study notes and readings posted on course

More information