Call Paths for Pin Tools
|
|
- Bathsheba Walsh
- 5 years ago
- Views:
Transcription
1 , Xu Liu, and John Mellor-Crummey Department of Computer Science Rice University CGO'14, Orlando, FL February 17, 2014
2 What is a Call Path? main() A() B() Foo() { x = *ptr;} Chain of function calls that led to the current point in the program. (a.k.a Calling Context / Call Stack / Backtrace / Activation Record)
3 What is a Call Path? Performance Analysis Tools main() A() Debuggers B() Foo() { x = *ptr;} Chain of function calls that led to the current point in the program. (a.k.a Calling Context / Call Stack / Backtrace / Activation Record)
4 Need: Ubiquitous Call Paths Fine-grained monitoring tools Correctness Data race detection Taint analysis Array out of bound detection Performance Reuse-distance analysis Cache simulation False sharing detection Redundancy detection (e.g. dead writes) Other tools Debugging, testing, resiliency, replay, etc.
5 Need: Ubiquitous Call Paths Fine-grained monitoring tools Correctness Data race detection Taint analysis Array out of bound detection Performance Reuse-distance analysis Cache simulation False sharing detection Redundancy detection (e.g. dead writes) Other tools Debugging, testing, resiliency, replay, etc.
6 Need: Ubiquitous Call Paths Fine-grained monitoring tools Correctness Data race detection Taint analysis Array out of bound detection Performance Attribute each conflicting access to its full call path Thread 1 Thread 2 Reuse-distance analysis Cache simulation False sharing detection Redundancy detection (e.g. dead writes) Other tools Debugging, testing, resiliency, replay, etc.
7 Need: Ubiquitous Call Paths Fine-grained monitoring tools Correctness Data race detection Taint analysis Array out of bound detection Performance Reuse-distance analysis Cache simulation False sharing detection Redundancy detection (e.g. dead writes) Other tools Debugging, testing, resiliency, replay, etc.
8 Need: Ubiquitous Call Paths Fine-grained monitoring tools Correctness Data race detection Taint analysis Array out of bound detection Performance Reuse-distance analysis Cache simulation False sharing detection Redundancy detection (e.g. dead writes) Other tools [Liu et al. ISPASS 13] Debugging, testing, resiliency, replay, etc. Attribute distance between use and reuse to references in full context
9 Need: Ubiquitous Call Paths Fine-grained monitoring tools Correctness Data race detection Taint analysis Array out of bound detection Performance Reuse-distance analysis Cache simulation False sharing detection Redundancy detection (e.g. dead writes) Other tools [Liu et al. ISPASS 13] Debugging, testing, resiliency, replay, etc. Attribute distance between use and reuse to references in full context
10 Need: Ubiquitous Call Paths Fine-grained monitoring tools Correctness Data race detection Taint analysis Array out of bound detection Performance Reuse-distance analysis Cache simulation False sharing detection Redundancy detection (e.g. dead writes) Other tools Debugging, testing, resiliency, replay, etc.
11 Need: Ubiquitous Call Paths Fine-grained monitoring tools Correctness Data race detection Taint analysis Array out of bound detection Performance Reuse-distance analysis Cache simulation False sharing detection Redundancy detection (e.g. dead writes) Other tools Debugging, testing, resiliency, replay, etc.
12 State-of-the-art in Collecting Ubiquitous Call Paths It will slow down execution by a factor of several thousand compared to native execution -- I'd guess -so you'll wind up with something that is unusably slow on anything except the smallest problems.! If you tried to invoke Thread::getCallStack on every memory access there would be very serious performance problems your program would probably never reach main. No support for collecting calling contexts
13 State-of-the-art in Collecting Ubiquitous Call Paths It will slow down execution by a factor of several thousand compared to native execution -- I'd guess -so you'll wind up with something that is unusably slow on anything except the smallest problems.! If you tried to invoke Thread::getCallStack on every memory access there would be very serious performance problems your program would probably never reach main. No support for collecting calling contexts We built one ourselves CCTLib
14 Roadmap CCTLib Ubiquitous call path collection Attributing costs to data objects Evaluation Conclusions
15 Roadmap CCTLib Ubiquitous call path collection Attributing costs to data objects Evaluation Conclusions
16 Top Three Challenges 1 Overhead (Space) 2 Overhead (Time) 3 Overhead (Parallel scaling)
17 Top Three Challenges 1 Overhead (Space) 2 Overhead (Time) 3 Overhead (Parallel scaling)
18 Top Three Challenges 1 Overhead (Space) 2 Overhead (Time) 3 Overhead (Parallel scaling)
19 Top Three Challenges 1 Overhead (Space) 2 Overhead (Time) 3 Overhead (Parallel scaling)
20 Top Three Challenges 1 Overhead (Space) 2 Overhead (Time) 3 Overhead (Parallel scaling)
21 Top Three Challenges 1 Overhead (Space) 2 Overhead (Time) 3 Overhead (Parallel scaling)
22 Top Three Challenges 1 Overhead (Space) 2 Overhead (Time) 3 Overhead (Parallel scaling)
23 Store History of Contexts Compactly Problem: Deluge of call paths
24 Store History of Contexts Compactly Problem: Deluge of call paths A A A A B B C C D E F G Instruction stream
25 Store History of Contexts Compactly Problem: Deluge of call paths A A A A B B C C D E F G Instruction stream
26 Store History of Contexts Compactly Problem: Deluge of call paths A A A A B B C C D E F G Instruction stream
27 Store History of Contexts Compactly Problem: Deluge of call paths Solution Call paths share common prefix Store call paths as a A A A A B B C C D E F G calling context tree (CCT) One CCT per thread A B C D E F G Instruction stream Instruction stream
28 Shadow Stack to Avoid Unwinding Overhead Problem: Unwinding overhead Main() P() Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} }
29 Shadow Stack to Avoid Unwinding Overhead Problem: Unwinding overhead Main() P() Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} }
30 Shadow Stack to Avoid Unwinding Overhead Problem: Unwinding overhead Solution: Reverse the process. Eagerly build a replica/shadow stack on-the-fly. Main() P() Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} }
31 Shadow Stack to Avoid Unwinding Overhead Problem: Unwinding overhead Solution: Reverse the process. Eagerly build a replica/shadow stack on-the-fly. Main() CTXT Main() P() Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} }
32 Shadow Stack to Avoid Unwinding Overhead Problem: Unwinding overhead Solution: Reverse the process. Eagerly build a replica/shadow stack on-the-fly. Main() CTXT Main() call P() P() Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} }
33 Shadow Stack to Avoid Unwinding Overhead Problem: Unwinding overhead Main() Solution: Reverse the process. Eagerly build a replica/shadow stack on-the-fly. Main() P() CTXT P() Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} }
34 Shadow Stack to Avoid Unwinding Overhead Problem: Unwinding overhead Main() Solution: Reverse the process. Eagerly build a replica/shadow stack on-the-fly. Main() P() call P() Tools can obtain! pointer to the! current context! via CTXT! in constant time Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} } CTXT Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} }
35 Shadow Stack to Avoid Unwinding Overhead Problem: Unwinding overhead Main() Solution: Reverse the process. Eagerly build a replica/shadow stack on-the-fly. Main() P() CTXT P() return Tools can obtain! pointer to the! current context! via CTXT! in constant time Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} } Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} }
36 Shadow Stack to Avoid Unwinding Overhead Problem: Unwinding overhead Main() Solution: Reverse the process. Eagerly build a replica/shadow stack on-the-fly. Main() P() P() Tools can obtain! pointer to the! current context! via CTXT! in constant time call Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} } Foo() { *ptr = *ptr 100; = 100; x x = = 42; 42;} } CTXT Z()
37 Maintaining CONTE CTXT Main() P() W() CTXT Z()
38 Maintaining CONTE CTXT Main() CTXT P() Return to caller: Constant time update W() Z()
39 Maintaining CONTE CTXT Main() CTXT P() W() Z()
40 Maintaining CONTE CTXT Main() CTXT P() CTXT P() W() X() Y() Z() W() Z() Finding a callee from its caller involves a lookup
41 Maintaining CONTE CTXT Main() CTXT P() CTXT P() W() X() Y() Z() W() Z() Finding a callee from its caller involves a lookup
42 Maintaining CONTE CTXT Main() P() CTXT P() W() X() Y() CTXT Z() W() Z() Finding a callee from its caller involves a lookup
43 Accelerate Lookup with Splay Trees CTXT P() W() X() Y() Z() Splay tree [ Self-adjusting binary search trees by Sleator et al. 1985] ensures frequently called functions are near the root of the tree
44 Accelerate Lookup with Splay Trees P() W() X() Y() CTXT Z() Splay tree [ Self-adjusting binary search trees by Sleator et al. 1985] ensures frequently called functions are near the root of the tree
45 Accelerate Lookup with Splay Trees CTXT P() W() X() Y() Z() Splay tree [ Self-adjusting binary search trees by Sleator et al. 1985] ensures frequently called functions are near the root of the tree
46 Accelerate Lookup with Splay Trees CTXT P() W() X() Y() Z() Splay tree [ Self-adjusting binary search trees by Sleator et al. 1985] ensures frequently called functions are near the root of the tree
47 Accelerate Lookup with Splay Trees CTXT P() W() X() Y() Z() Splay tree [ Self-adjusting binary search trees by Sleator et al. 1985] ensures frequently called functions are near the root of the tree
48 Accelerate Lookup with Splay Trees P() CTXT W() X() Y() Z() Splay tree [ Self-adjusting binary search trees by Sleator et al. 1985] ensures frequently called functions are near the root of the tree
49 Context Should Incorporate Instruction Pointer main() P() Foo(){ *ptr = 100; x = 42; }
50 Context Should Incorporate Instruction Pointer main() P() Foo(){ *ptr = 100; x = 42; } CTXT = Foo: INS 1
51 Context Should Incorporate Instruction Pointer main() P() Foo(){ *ptr = 100; x = 42; } CTXT = Foo: INS 1 CTXT = Foo: INS 2
52 Attributing to Instructions main() Foo() P() CCTNode A CCT node represents a Pin trace CCTLib maintains node Pin trace mapping Each slot in a node represents an instruction in a Pin trace Instructions CTXT
53 Attributing to Instructions main() P() Problem: Mapping IP to Slot at runtime Variable size x86 instructions Non-sequential control flow Solution: CCTNode Foo() Pin s trace-instrumentation to hardwire Slot# as argument to context query routine for an IP Result: Constant time to query CTXT Instructions
54 Attributing to Instructions main() P() CCTNode Foo() Instructions CTXT Problem: Mapping IP to Slot at runtime Variable size x86 instructions Non-sequential control flow Solution: Pin s trace-instrumentation to hardwire Slot# as argument to context query routine for an IP Result: Constant time to query CTXT
55 Attributing to Instructions main() P() Problem: Mapping IP to Slot at runtime Variable size x86 instructions Non-sequential control flow CCTNode Foo() GetContext(2) Solution: Pin s trace-instrumentation to hardwire Slot# as argument to context query routine for an IP Result: Constant time to query 2 CTXT Instructions CTXT
56 Attributing to Instructions main() P() Problem: Mapping IP to Slot at runtime Variable size x86 instructions Non-sequential control flow CCTNode Foo() Solution: Pin s trace-instrumentation to hardwire Slot# as argument to context query routine for an IP Result: Constant time to query 2 CTXT Instructions CTXT
57 Attributing to Instructions main() P() CCTNode Foo() Instructions CTXT Problem: Mapping IP to Slot at runtime Variable size x86 instructions Non-sequential control flow Solution: Pin s trace-instrumentation to hardwire Slot# as argument to context query routine for an IP Result: Constant time to query CTXT
58 Attributing to Instructions main() Foo() P() CCTNode GetContext(5) Instructions CTXT Problem: Mapping IP to Slot at runtime Variable size x86 instructions Non-sequential control flow Solution: Pin s trace-instrumentation to hardwire Slot# as argument to context query routine for an IP Result: Constant time to query CTXT
59 Attributing to Instructions main() P() CCTNode Foo() Instructions CTXT Problem: Mapping IP to Slot at runtime Variable size x86 instructions Non-sequential control flow Solution: Pin s trace-instrumentation to hardwire Slot# as argument to context query routine for an IP Result: Constant time to query CTXT
60 Roadmap CCTLib Ubiquitous call path collection Attributing costs to data objects Evaluation Conclusions
61 Data-Centric Attribution in CCTLib int MyArray[SZ];! int * Create(){ return malloc( ); }! void Update(int * ptr) { for( ) ptr[i]++; }! int main(){ int * p; if ( ) p = Create(); else p = MyArray; Update(p); } Main() Create() Update() malloc() Associate each data access to its data object Data object Dynamic allocation: Call path of allocation site Static objects: Variable name
62 Data-Centric Attribution in CCTLib int MyArray[SZ];! int * Create(){ return malloc( ); }! void Update(int * ptr) { for( ) ptr[i]++; }! int main(){ int * p; if ( ) p = Create(); else p = MyArray; Update(p); } Main() Create() Update() malloc() Associate each data access to its data object Data object Dynamic allocation: Call path of allocation site Static objects: Variable name
63 Data-Centric Attribution in CCTLib int MyArray[SZ];! int * Create(){ return malloc( ); }! void Update(int * ptr) { for( ) ptr[i]++; }! int main(){ int * p; if ( ) p = Create(); else p = MyArray; Update(p); } Main() Create() Update() malloc() Associate each data access to its data object Data object Dynamic allocation: Call path of allocation site Static objects: Variable name
64 Data-Centric Attribution in CCTLib int MyArray[SZ];! int * Create(){ return malloc( ); }! void Update(int * ptr) { for( ) ptr[i]++; }! int main(){ int * p; if ( ) p = Create(); else p = MyArray; Update(p); } Main() Create() Update() malloc() Associate each data access to its data object Data object Dynamic allocation: Call path of allocation site Static objects: Variable name
65 Data-Centric Attribution in CCTLib int MyArray[SZ];! int * Create(){ return malloc( ); }! void Update(int * ptr) { for( ) ptr[i]++; }! int main(){ int * p; if ( ) p = Create(); else p = MyArray; Update(p); } Main() Create() Update() malloc() Associate each data access to its data object Data object Dynamic allocation: Call path of allocation site Static objects: Variable name
66 Data-Centric Attribution in CCTLib int MyArray[SZ];! int * Create(){ return malloc( ); }! void Update(int * ptr) { for( ) ptr[i]++; }! int main(){ int * p; if ( ) p = Create(); else p = MyArray; Update(p); } Main() Create() Update() malloc() Associate each data access to its data object Data object Dynamic allocation: Call path of allocation site Static objects: Variable name
67 Data-Centric Attribution in CCTLib int MyArray[SZ];! int * Create(){ return malloc( ); }! void Update(int * ptr) { for( ) ptr[i]++; }! int main(){ int * p; if ( ) p = Create(); else p = MyArray; Update(p); } Main() Create() Update() malloc() Associate each data access to its data object Data object Dynamic allocation: Call path of allocation site Static objects: Variable name
68 Data-Centric Attribution in CCTLib int MyArray[SZ];! int * Create(){ return malloc( ); }! void Update(int * ptr) { for( ) ptr[i]++; }! int main(){ int * p; if ( ) p = Create(); else p = MyArray; Update(p); } Main() Create() Update() malloc() Associate each data access to its data object Data object Dynamic allocation: Call path of allocation site Static objects: Variable name
69 Data-Centric Attribution How? Record all <AddressRange, VariableName> tuples in a map Instrument all allocation/free routines and maintain <AddressRange, CallPath> tuples in the map At each memory access: search the map for the address! Problems Searching the map on each access is expensive Map needs to be concurrent for threaded programs
70 Data-Centric Attribution using a Balanced Tree Observation: Updates to the map are infrequent Lookups in the maps are frequent! Solution #1: sorted map Keep <AddressRange, Object> in a balanced binary tree Low memory cost O(N) Moderate lookup cost O(log N) Concurrent access is handled by a novel replicated tree data structure
71 Data-Centric Attribution using Shadow Memory Solution #2: shadow memory Application CCTLib
72 Data-Centric Attribution using Shadow Memory Solution #2: shadow memory ObjA Application CCTLib
73 Data-Centric Attribution using Shadow Memory Solution #2: shadow memory ObjA Application ObjA ObjA ObjA ObjA ObjA ObjA ObjA ObjA CCTLib
74 Data-Centric Attribution using Shadow Memory Solution #2: shadow memory ObjA Application CCTLib ObjA ObjA ObjA ObjA ObjB ObjC ObjA ObjA ObjA ObjA ObjB ObjB ObjB ObjB ObjC ObjC ObjC ObjC
75 Data-Centric Attribution using Shadow Memory Solution #2: shadow memory ObjA Application CCTLib ObjA ObjA ObjA ObjA ObjB ObjC ObjA ObjA ObjA ObjA ObjB ObjB ObjB ObjB ObjC ObjC ObjC ObjC
76 Data-Centric Attribution using Shadow Memory Solution #2: shadow memory ObjA Application CCTLib ObjA ObjA ObjA ObjA ObjB ObjC ObjA ObjA ObjA ObjA ObjB ObjB ObjB ObjB ObjC ObjC ObjC ObjC For each memory cell, a shadow cell holds a handle for the memory cell s data object Low lookup cost O(1), high memory cost Shadow memory supports concurrent access CCTLib supports both solutions, clients can choose
77 Roadmap CCTLib Ubiquitous call path collection Attributing costs to data objects Evaluation Conclusions
78 Evaluation Program Running time! in sec Experimental setup: 2.2GHz Intel Sandy Bridge 128GB DDR3 GNU tool chain astr 361 bzip2 161 gcc 70 h264ref 618 hmmer 446 libquantum 462 mcf 320 omnetpp 352 Xalan 295 ROSE 24 LAMMPS 99 LULESH 67
79 Evaluation Spec Int 2006 reference benchmark Program Running time! in sec Experimental setup: 2.2GHz Intel Sandy Bridge 128GB DDR3 GNU tool chain astr 361 bzip2 161 gcc 70 h264ref 618 hmmer 446 libquantum 462 mcf 320 omnetpp 352 Xalan 295 ROSE 24 LAMMPS 99 LULESH 67
80 Evaluation Spec Int 2006 reference benchmark Program Running time! in sec astr 361 bzip2 161 gcc 70 h264ref 618 hmmer 446 libquantum 462 mcf 320 omnetpp 352 Xalan 295 ROSE 24 LAMMPS 99 LULESH 67 Experimental setup: 2.2GHz Intel Sandy Bridge 128GB DDR3 GNU tool chain Source-to-source compiler from LLNL! 3M LOC compiling 70K LOC Deep call chains
81 Evaluation Spec Int 2006 reference benchmark Program Running time! in sec astr 361 bzip2 161 gcc 70 h264ref 618 hmmer 446 libquantum 462 mcf 320 omnetpp 352 Xalan 295 ROSE 24 LAMMPS 99 LULESH 67 Experimental setup: 2.2GHz Intel Sandy Bridge 128GB DDR3 GNU tool chain Source-to-source compiler from LLNL! 3M LOC compiling 70K LOC Deep call chains Molecular dynamics code! 500K LOC Deep call chains Multithreaded
82 Evaluation Spec Int 2006 reference benchmark Program Running time! in sec astr 361 bzip2 161 gcc 70 h264ref 618 hmmer 446 libquantum 462 mcf 320 omnetpp 352 Xalan 295 ROSE 24 LAMMPS 99 LULESH 67 Experimental setup: 2.2GHz Intel Sandy Bridge 128GB DDR3 GNU tool chain Source-to-source compiler from LLNL! 3M LOC compiling 70K LOC Deep call chains Molecular dynamics code! 500K LOC Deep call chains Multithreaded Hydrodynamics mini-app from LLNL! Frequent data allocation and de-allocations Memory bound Multithreaded, Poor scaling
83 Overhead Analysis Call path collection Data-centric! att Time overhead relative to original program (Null Pin tool) Time overhead relative to simple instruction counting Pin tool Memory overhead relative to original program Balanced Tree 30x 4.5x 1.7x 2.0x 1.8x Sh
84 Overhead Analysis Call path collection Data-centric! att Time overhead relative to original program (Null Pin tool) Time overhead relative to simple instruction counting Pin tool Memory overhead relative to original program Balanced Tree 30x 4.5x 1.7x 2.0x 1.8x Sh
85 Overhead Analysis Call path collection Data-centric attribution! Balanced Tree Shadow Memory Time overhead relative to simple instruction counting Pin tool 1.7x 4.5x 2.3x Memory overhead relative to original program 1.8x 2.0x 11x
86 Overhead Analysis Call path collection Data-centric attribution! Balanced Tree Shadow Memory Time overhead relative to simple instruction counting Pin tool 1.7x 4.5x 2.3x Memory overhead relative to original program 1.8x 2.0x 11x
87 CCTLib Scales to Multiple Threads CCTLib overhead of N threads: CCTLib scalability of N threads: Emon(n) Eorig(n) Overhead(1) Overhead(N) Higher scalability is better, 1.0 is ideal CCTLib scalability on LAMMPS Scalability Call path collection Data-centric attribution via Sorted maps Number of threads Data-centric attribution via Shadow memoory
88 Conclusions Many tools can benefit from attributing metrics to full calling contexts and/or data objects Ubiquitous calling context collection was previously considered prohibitively expensive Fine-grain attribution of metrics to calling contexts and data objects is practical Full-precision call path collection and data-centric attribution require only modest space and time overhead Choice of algorithms and data structures was a key to success
89 Conclusions Many tools can benefit from attributing metrics to full calling contexts and/or data objects Ubiquitous calling context collection was previously considered prohibitively expensive Fine-grain attribution of metrics to calling contexts and data objects is practical Full-precision call path collection and data-centric attribution require only modest space and time overhead Choice of algorithms and data structures was a key to success
90 Conclusions Many tools can benefit from attributing metrics to full calling contexts and/or data objects Ubiquitous calling context collection was previously considered prohibitively expensive Fine-grain attribution of metrics to calling contexts and data objects is practical Full-precision call path collection and data-centric attribution require only modest space and time overhead Choice of algorithms and data structures was a key to success
91 Other Complications in Real Programs Complex control flow Signal handling Setjmp-Longjmp C++ exceptions (try-catch)! Thread creation and destruction Maintaining parent-child relationships between threads Scalability to large number of threads
A Fast Instruction Set Simulator for RISC-V
A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.
More informationEffective Performance Measurement and Analysis of Multithreaded Applications
Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined
More informationLightweight Memory Tracing
Lightweight Memory Tracing Mathias Payer*, Enrico Kravina, Thomas Gross Department of Computer Science ETH Zürich, Switzerland * now at UC Berkeley Memory Tracing via Memlets Execute code (memlets) for
More informationProject 0: Implementing a Hash Table
Project : Implementing a Hash Table CS, Big Data Systems, Spring Goal and Motivation. The goal of Project is to help you refresh basic skills at designing and implementing data structures and algorithms.
More informationA program execution is memory safe so long as memory access errors never occur:
A program execution is memory safe so long as memory access errors never occur: Buffer overflows, null pointer dereference, use after free, use of uninitialized memory, illegal free Memory safety categories
More informationProcess a program in execution; process execution must progress in sequential fashion. Operating Systems
Process Concept An operating system executes a variety of programs: Batch system jobs Time-shared systems user programs or tasks 1 Textbook uses the terms job and process almost interchangeably Process
More informationPinpointing Data Locality Problems Using Data-centric Analysis
Center for Scalable Application Development Software Pinpointing Data Locality Problems Using Data-centric Analysis Xu Liu XL10@rice.edu Department of Computer Science Rice University Outline Introduction
More informationA Practical Fine-grained Profiler
A Practical Fine-grained Profiler Shasha Wen, Xu Liu! College of William and Mary! {swen, xl10}@cs.wm.edu Milind Chabbi! HPE Lab! milind.m.chabbi@hpe.com Motivation: Computation Redundancy a = m; b = sqrt(a);
More informationYukinori Sato (JAIST, JST CREST) Yasushi Inoguchi (JAIST) Tadao Nakamura (Keio University)
Whole Program Data Dependence Profiling to Unveil Parallel Regions in the Dynamic Execution Yukinori Sato (JAIST, JST CREST) Yasushi Inoguchi (JAIST) Tadao Nakamura (Keio University) 2012 IEEE International
More informationEvolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017
Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University http://hpctoolkit.org Scalable Tools Workshop 7 August 2017 HPCToolkit 1 HPCToolkit Workflow source code compile &
More informationCS527 Software Security
Security Policies Purdue University, Spring 2018 Security Policies A policy is a deliberate system of principles to guide decisions and achieve rational outcomes. A policy is a statement of intent, and
More informationTrees. (Trees) Data Structures and Programming Spring / 28
Trees (Trees) Data Structures and Programming Spring 2018 1 / 28 Trees A tree is a collection of nodes, which can be empty (recursive definition) If not empty, a tree consists of a distinguished node r
More informationDynamic Access Binary Search Trees
Dynamic Access Binary Search Trees 1 * are self-adjusting binary search trees in which the shape of the tree is changed based upon the accesses performed upon the elements. When an element of a splay tree
More informationCS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11
CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Spring 2015 1 Handling Overloaded Declarations Two approaches are popular: 1. Create a single symbol table
More informationDynamic Access Binary Search Trees
Dynamic Access Binary Search Trees 1 * are self-adjusting binary search trees in which the shape of the tree is changed based upon the accesses performed upon the elements. When an element of a splay tree
More informationJAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder
JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance
More informationPERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES
PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES Anish Athalye and Patrick Long Mentors: Austin Clements and Stephen Tu 3 rd annual MIT PRIMES Conference Sequential
More informationAcknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides
Garbage Collection Last time Compiling Object-Oriented Languages Today Motivation behind garbage collection Garbage collection basics Garbage collection performance Specific example of using GC in C++
More informationEnergy-centric DVFS Controlling Method for Multi-core Platforms
Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To
More informationCS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines
CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell
More informationQ1: /8 Q2: /30 Q3: /30 Q4: /32. Total: /100
ECE 2035(A) Programming for Hardware/Software Systems Fall 2013 Exam Three November 20 th 2013 Name: Q1: /8 Q2: /30 Q3: /30 Q4: /32 Total: /100 1/10 For functional call related questions, let s assume
More informationRuntime. The optimized program is ready to run What sorts of facilities are available at runtime
Runtime The optimized program is ready to run What sorts of facilities are available at runtime Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis token stream Syntactic
More informationHonours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui
Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui Projects 1 Information flow analysis for mobile applications 2 2 Machine-learning-guide typestate analysis for UAF vulnerabilities 3 3 Preventing
More informationProgramming Studio #9 ECE 190
Programming Studio #9 ECE 190 Programming Studio #9 Concepts: Functions review 2D Arrays GDB Announcements EXAM 3 CONFLICT REQUESTS, ON COMPASS, DUE THIS MONDAY 5PM. NO EXTENSIONS, NO EXCEPTIONS. Functions
More informationDynamic Trace Analysis with Zero-Suppressed BDDs
University of Colorado, Boulder CU Scholar Electrical, Computer & Energy Engineering Graduate Theses & Dissertations Electrical, Computer & Energy Engineering Spring 4-1-2011 Dynamic Trace Analysis with
More informationImproving Error Checking and Unsafe Optimizations using Software Speculation. Kirk Kelsey and Chen Ding University of Rochester
Improving Error Checking and Unsafe Optimizations using Software Speculation Kirk Kelsey and Chen Ding University of Rochester Outline Motivation Brief problem statement How speculation can help Our software
More informationROSE-CIRM Detecting C-Style Errors in UPC Code
ROSE-CIRM Detecting C-Style Errors in UPC Code Peter Pirkelbauer 1 Chunhuah Liao 1 Thomas Panas 2 Daniel Quinlan 1 1 2 Microsoft Parallel Data Warehouse This work was funded by the Department of Defense
More informationFast, precise dynamic checking of types and bounds in C
Fast, precise dynamic checking of types and bounds in C Stephen Kell stephen.kell@cl.cam.ac.uk Computer Laboratory University of Cambridge p.1 Tool wanted if (obj >type == OBJ COMMIT) { if (process commit(walker,
More informationKruiser: Semi-synchronized Nonblocking Concurrent Kernel Heap Buffer Overflow Monitoring
NDSS 2012 Kruiser: Semi-synchronized Nonblocking Concurrent Kernel Heap Buffer Overflow Monitoring Donghai Tian 1,2, Qiang Zeng 2, Dinghao Wu 2, Peng Liu 2 and Changzhen Hu 1 1 Beijing Institute of Technology
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationSection 7: Wait/Exit, Address Translation
William Liu October 15, 2014 Contents 1 Wait and Exit 2 1.1 Thinking about what you need to do.............................. 2 1.2 Code................................................ 2 2 Vocabulary 4
More informationCS2141 Software Development using C/C++ Debugging
CS2141 Software Development using C/C++ Debugging Debugging Tips Examine the most recent change Error likely in, or exposed by, code most recently added Developing code incrementally and testing along
More informationCpt S 122 Data Structures. Data Structures Trees
Cpt S 122 Data Structures Data Structures Trees Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Motivation Trees are one of the most important and extensively
More informationDynamic Dispatch and Duck Typing. L25: Modern Compiler Design
Dynamic Dispatch and Duck Typing L25: Modern Compiler Design Late Binding Static dispatch (e.g. C function calls) are jumps to specific addresses Object-oriented languages decouple method name from method
More informationMy malloc: mylloc and mhysa. Johan Montelius HT2016
1 Introduction My malloc: mylloc and mhysa Johan Montelius HT2016 So this is an experiment where we will implement our own malloc. We will not implement the world s fastest allocator, but it will work
More informationCSC 2400: Computer Systems. Using the Stack for Function Calls
CSC 24: Computer Systems Using the Stack for Function Calls Lecture Goals Challenges of supporting functions! Providing information for the called function Function arguments and local variables! Allowing
More informationThread-Level Speculation on Off-the-Shelf Hardware Transactional Memory
Thread-Level Speculation on Off-the-Shelf Hardware Transactional Memory Rei Odaira Takuya Nakaike IBM Research Tokyo Thread-Level Speculation (TLS) [Franklin et al., 92] or Speculative Multithreading (SpMT)
More informationProject 0: Implementing a Hash Table
CS: DATA SYSTEMS Project : Implementing a Hash Table CS, Data Systems, Fall Goal and Motivation. The goal of Project is to help you develop (or refresh) basic skills at designing and implementing data
More informationUsing Cache Models and Empirical Search in Automatic Tuning of Applications. Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX
Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Outline Overview of Framework Fine grain control of transformations
More informationSHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008
SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationWALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems
: A Writeback-Aware LLC Management for PCM-based Main Memory Systems Bahareh Pourshirazi *, Majed Valad Beigi, Zhichun Zhu *, and Gokhan Memik * University of Illinois at Chicago Northwestern University
More informationRun-Time Monitoring with Adjustable Overhead Using Dataflow-Guided Filtering
Run-Time Monitoring with Adjustable Overhead Using Dataflow-Guided Filtering Daniel Lo, Tao Chen, Mohamed Ismail, and G. Edward Suh Cornell University Ithaca, NY 14850, USA {dl575, tc466, mii5, gs272}@cornell.edu
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationConcurrency, Thread. Dongkun Shin, SKKU
Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point
More informationOverview. Constructors and destructors Virtual functions Single inheritance Multiple inheritance RTTI Templates Exceptions Operator Overloading
How C++ Works 1 Overview Constructors and destructors Virtual functions Single inheritance Multiple inheritance RTTI Templates Exceptions Operator Overloading Motivation There are lot of myths about C++
More informationHigh Performance Managed Languages. Martin Thompson
High Performance Managed Languages Martin Thompson - @mjpt777 Really, what is your preferred platform for building HFT applications? Why do you build low-latency applications on a GC ed platform? Agenda
More informationShared-memory Parallel Programming with Cilk Plus
Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More informationCIRM - Dynamic Error Detection
CIRM - Dynamic Error Detection Peter Pirkelbauer Center for Applied Scientific Computing (CASC) Lawrence Livermore National Laboratory This work was funded by the Department of Defense and used elements
More informationRunning Valgrind on multiple processors: a prototype. Philippe Waroquiers FOSDEM 2015 valgrind devroom
Running Valgrind on multiple processors: a prototype Philippe Waroquiers FOSDEM 2015 valgrind devroom 1 Valgrind and threads Valgrind runs properly multi-threaded applications But (mostly) runs them using
More informationG Programming Languages - Fall 2012
G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter
More informationIntel VTune Amplifier XE
Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance
More informationEfficient Memory Shadowing for 64-bit Architectures
Efficient Memory Shadowing for 64-bit Architectures The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Qin Zhao, Derek Bruening,
More informationProcess Concepts. CSC400 - Operating Systems. 3. Process Concepts. J. Sumey
CSC400 - Operating Systems 3. Process Concepts J. Sumey Overview Concurrency Processes & Process States Process Accounting Interrupts & Interrupt Processing Interprocess Communication CSC400 - Process
More informationProgram Profiling. Xiangyu Zhang
Program Profiling Xiangyu Zhang Outline What is profiling. Why profiling. Gprof. Efficient path profiling. Object equality profiling 2 What is Profiling Tracing is lossless, recording every detail of a
More informationRun-time Environments - 3
Run-time Environments - 3 Y.N. Srikant Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Outline of the Lecture n What is run-time
More informationHeckaton. SQL Server's Memory Optimized OLTP Engine
Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationMulti-threaded Queries. Intra-Query Parallelism in LLVM
Multi-threaded Queries Intra-Query Parallelism in LLVM Multithreaded Queries Intra-Query Parallelism in LLVM Yang Liu Tianqi Wu Hao Li Interpreted vs Compiled (LLVM) Interpreted vs Compiled (LLVM) Interpreted
More informationRun-time Environments
Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction
More informationELP. Effektive Laufzeitunterstützung für zukünftige Programmierstandards. Speaker: Tim Cramer, RWTH Aachen University
ELP Effektive Laufzeitunterstützung für zukünftige Programmierstandards Agenda ELP Project Goals ELP Achievements Remaining Steps ELP Project Goals Goals of ELP: Improve programmer productivity By influencing
More informationRun-time Environments
Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction
More informationResource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe
More informationFast dynamic program analysis Race detection. Konstantin Serebryany May
Fast dynamic program analysis Race detection Konstantin Serebryany May 20 2011 Agenda Dynamic program analysis Race detection: theory ThreadSanitizer: race detector Making ThreadSanitizer
More informationProfiling & Optimization
Lecture 18 Sources of Game Performance Issues? 2 Avoid Premature Optimization Novice developers rely on ad hoc optimization Make private data public Force function inlining Decrease code modularity removes
More informationRun-Time Environments/Garbage Collection
Run-Time Environments/Garbage Collection Department of Computer Science, Faculty of ICT January 5, 2014 Introduction Compilers need to be aware of the run-time environment in which their compiled programs
More informationChapter 3: Processes. Operating System Concepts 8th Edition,
Chapter 3: Processes, Administrivia Friday: lab day. For Monday: Read Chapter 4. Written assignment due Wednesday, Feb. 25 see web site. 3.2 Outline What is a process? How is a process represented? Process
More informationDataCollider: Effective Data-Race Detection for the Kernel
DataCollider: Effective Data-Race Detection for the Kernel John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, Kirk Olynyk Microsoft Windows and Microsoft Research {jerick, madanm, sburckha, kirko}@microsoft.com
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationDEBUGGING: DYNAMIC PROGRAM ANALYSIS
DEBUGGING: DYNAMIC PROGRAM ANALYSIS WS 2017/2018 Martina Seidl Institute for Formal Models and Verification System Invariants properties of a program must hold over the entire run: integrity of data no
More informationCS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017
CS 137 Part 8 Merge Sort, Quick Sort, Binary Search November 20th, 2017 This Week We re going to see two more complicated sorting algorithms that will be our first introduction to O(n log n) sorting algorithms.
More informationUni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing
Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing Shigeki Akiyama, Kenjiro Taura The University of Tokyo June 17, 2015 HPDC 15 Lightweight Threads Lightweight threads enable
More informationIntroduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana
Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too
More informationScalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance
Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander*, Esteban Meneses, Harshitha Menon*, Phil Miller*, Sriram Krishnamoorthy, Laxmikant V. Kale* jliffl2@illinois.edu,
More informationResearch on the Implementation of MPI on Multicore Architectures
Research on the Implementation of MPI on Multicore Architectures Pengqi Cheng Department of Computer Science & Technology, Tshinghua University, Beijing, China chengpq@gmail.com Yan Gu Department of Computer
More informationChapter 12: Indexing and Hashing (Cnt(
Chapter 12: Indexing and Hashing (Cnt( Cnt.) Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition
More informationIntroducing the Cray XMT. Petr Konecny May 4 th 2007
Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions
More informationEfficient Data Race Detection for Unified Parallel C
P A R A L L E L C O M P U T I N G L A B O R A T O R Y Efficient Data Race Detection for Unified Parallel C ParLab Winter Retreat 1/14/2011" Costin Iancu, LBL" Nick Jalbert, UC Berkeley" Chang-Seo Park,
More informationNDSeq: Runtime Checking for Nondeterministic Sequential Specs of Parallel Correctness
EECS Electrical Engineering and Computer Sciences P A R A L L E L C O M P U T I N G L A B O R A T O R Y NDSeq: Runtime Checking for Nondeterministic Sequential Specs of Parallel Correctness Jacob Burnim,
More informationDynamic Race Detection with LLVM Compiler
Dynamic Race Detection with LLVM Compiler Compile-time instrumentation for ThreadSanitizer Konstantin Serebryany, Alexander Potapenko, Timur Iskhodzhanov, and Dmitriy Vyukov OOO Google, 7 Balchug st.,
More informationChapter 3: Process-Concept. Operating System Concepts 8 th Edition,
Chapter 3: Process-Concept, Silberschatz, Galvin and Gagne 2009 Chapter 3: Process-Concept Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Silberschatz, Galvin
More informationCustom Memory Allocation For Free
Custom Memory Allocation For Free Alin Jula and Lawrence Rauchwerger Texas A&M University November 4, 2006 Motivation Memory Allocation Matters Goals Speed Data locality Fragmentation Current State of
More informationSummary: Open Questions:
Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization
More informationCompilers. 8. Run-time Support. Laszlo Böszörmenyi Compilers Run-time - 1
Compilers 8. Run-time Support Laszlo Böszörmenyi Compilers Run-time - 1 Run-Time Environment A compiler needs an abstract model of the runtime environment of the compiled code It must generate code for
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors
More informationCSE409, Rob Johnson, Alin Tomescu, November 11 th, 2011 Buffer overflow defenses
Buffer overflow defenses There are two categories of buffer-overflow defenses: - Make it hard for the attacker to exploit buffer overflow o Address space layout randomization o Model checking to catch
More informationMobilizing the Micro-Ops: Exploiting Context Sensitive Decoding for Security and Energy Efficiency
Mobilizing the Micro-Ops: Exploiting Context Sensitive Decoding for Security and Energy Efficiency Mohammadkazem Taram, Ashish Venkat, Dean Tullsen University of California, San Diego The Tension between
More informationHigh Performance Managed Languages. Martin Thompson
High Performance Managed Languages Martin Thompson - @mjpt777 Really, what s your preferred platform for building HFT applications? Why would you build low-latency applications on a GC ed platform? Some
More informationDynamic Binary Instrumentation: Introduction to Pin
Dynamic Binary Instrumentation: Introduction to Pin Instrumentation A technique that injects instrumentation code into a binary to collect run-time information 2 Instrumentation A technique that injects
More informationGarbage Collection. CS 351: Systems Programming Michael Saelee
Garbage Collection CS 351: Systems Programming Michael Saelee = automatic deallocation i.e., malloc, but no free! system must track status of allocated blocks free (and potentially reuse)
More informationLightweight Memory Tracing
Lightweight Memory Tracing Mathias Payer ETH Zurich Enrico Kravina ETH Zurich Thomas R. Gross ETH Zurich Abstract Memory tracing (executing additional code for every memory access of a program) is a powerful
More informationStream Computing using Brook+
Stream Computing using Brook+ School of Electrical Engineering and Computer Science University of Central Florida Slides courtesy of P. Bhaniramka Outline Overview of Brook+ Brook+ Software Architecture
More informationReducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research
Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Joel Hestness jthestness@uwalumni.com Lenni Kuff lskuff@uwalumni.com Computer Science Department University of
More informationRuntime Defenses against Memory Corruption
CS 380S Runtime Defenses against Memory Corruption Vitaly Shmatikov slide 1 Reading Assignment Cowan et al. Buffer overflows: Attacks and defenses for the vulnerability of the decade (DISCEX 2000). Avijit,
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #6: Memory Management CS 61C L06 Memory Management (1) 2006-07-05 Andy Carle Memory Management (1/2) Variable declaration allocates
More informationLecture 03 Bits, Bytes and Data Types
Lecture 03 Bits, Bytes and Data Types Computer Languages A computer language is a language that is used to communicate with a machine. Like all languages, computer languages have syntax (form) and semantics
More informationKampala August, Agner Fog
Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler
More informationMemory Management. Disclaimer: some slides are adopted from book authors slides with permission 1
Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 CPU management Roadmap Process, thread, synchronization, scheduling Memory management Virtual memory Disk
More informationTree-Structured Indexes. Chapter 10
Tree-Structured Indexes Chapter 10 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k 25, [n1,v1,k1,25] 25,
More information