Performance and Code Tuning. CSCE 315 Programming Studio, Fall 2017 Tanzir Ahmed
|
|
- Bernard Stevens
- 5 years ago
- Views:
Transcription
1 Performance and Code Tuning CSCE 315 Programming Studio, Fall 2017 Tanzir Ahmed
2 Is Performance Important? Performance tends to improve with time HW Improvements Other things can be more important Accuracy Robustness Code Readability Worrying about it can cause problems More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason including blind stupidity. William A. Wulf
3 So Why Worry About It? Sometimes code performance is critical Very large-scale problems Inefficiencies that make things seem infeasible The gains in hardware improvement may be coming to an end Or, at least need more tricks to take advantage of It is not straight-forward to take full advantage of a 48-core or 1024 CPUs/GPUs Problems such as inter-core or inter-socket communication overhead, cache efficiency Performance Engineering will likely be important in making long-term improvements
4 So, How Can We Improve Performance? First, there are ways to improve performance that don t involve code tuning Before doing code tuning, you need to know what to tune Again, guess work is not helpful Finally, you can tune your code Carefully, measuring along the way
5 Performance Increases without Code Tuning Lower your Standards/Requirements (!!) Performance tuning is expensive May require h/w s/w optimization Performance is stated as a requirement far more than it is actually a requirement
6 Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design The overall program structure can play a huge role
7 Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Algorithms used have real differences Can have largest effect, especially asymptotically
8 Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Hidden OS calls within libraries their performance affects overall code Just removing repeating printing to standard output may increase performance significantly for some programs
9 Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Compiler Optimizations Automatic Optimization Getting better and better, but not perfect Different compilers work differently
10 Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Compiler Optimizations Upgrade Hardware Straightforward, if possible
11 Code Profiling Pareto Rule More than 80% of the time is spent on less than 20% of code In reality this proportion is even more (e.g., 5% code contributing to 90%, just an example) Determine where code is spending time No sense in optimizing where no time is spent Provide measurement basis Determine whether improvement really improved anything Need to take precise measurements
12 Profiling Techniques Profiler compile with profiling options, and run through profiler Gets list of functions/routines, and amount of time spent in each Use system timer (e.g., $time./a.out) Less ideal Might need test harness for functions Graph results for understanding Multiple profile results: see how profile changes for different input types
13 What Is Tuning? Making small-scale adjustments to correct code in order to improve performance After code is written and working Affects only small-scale: a few lines, or at most one routine Examples: adjusting details of loops, expressions Code tuning can sometimes improve code efficiency tremendously
14 What Tuning is Not Reducing lines of code Not an indicator of efficient code A guess at what might improve things Know what you re trying, and measure results Optimizing as you go Wait until finished, then go back to improve Optimizing while programming is often a waste A first choice for improvement Worry about other details/design first It is not Refactoring Refactoring improves code readability and quality, while Tuning often diminishes both
15 Common Inefficiencies Unnecessary I/O operations File access especially slow Paging/Memory issues Can vary by system System Calls Requires context switch which involves OS and scheduling overhead beyond the program s control Interpreted Languages Instead of being compiled as a whole, each line is interpreted and converted to machine language individually Table Source: Code Complete book
16 Operation Costs Different operations take different times Integer division longer than other ops Much slower than bit-shifting to achieve the same Transcendental functions (sin, sqrt, etc.) even longer Knowing this can help when tuning Vary by language In C++, private routine calls take about twice the time of an integer op, and in Java about half the time.
17 An Example Stealing an example from Charles Leiserson s Distinguised Lecture on Performance Engineering February 8, 2017 Joint Work with Bradley Kuszmaul and Tao Schardl. Performance Engineering a 4K x 4K Matrix Multiplication Machine: Dual-Socket Intel Xeon E v3 (Haswell) 18 cores 2.9 GHz 60 GB of DRAM
18 Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours)
19 Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours) Straightforward Java Implementation (~ 40 mins)
20 Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours) Straightforward Java Implementation (~ 40 mins) Straightforward C Implementation (<10 mins)
21 Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours) Straightforward Java Implementation (~ 40 mins) Straightforward C Implementation (<10 mins) Parallel Loops (~ 1 min)
22 Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours) Straightforward Java Implementation (~ 40 mins) Straightforward C Implementation (<10 mins) Parallel Loops (~ 1 min) Parallel Divide and Conquer 3.80
23 Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours) Straightforward Java Implementation (~ 40 mins) Straightforward C Implementation (<10 mins) Parallel Loops (~ 1 min) Parallel Divide and Conquer 3.80 add Vectorization (streaming the parallel operations) add AVX intrinsics (processor commands for vector operations) Strassen s algorithm 0.38
24 The Lesson from this Example Code performance engineering can make incredible improvements in performance. 67,243 times faster in this case! It is very rewarding However, this effort is often meaningless and even hurtful when used in tuning as you go style during development Tends to decrease code readability and reusability Often it is hard to identify the real bottleneck early on
25 Remember Code readability/maintainability/etc. is usually more important than efficiency Always start with well-written code, and only tune at the end Measure!
Principles of Software Construction: Objects, Design, and Concurrency
Principles of Software Construction: Objects, Design, and Concurrency Part 3: Design case studies Performance Charlie Garrod Michael Hilton School of Computer Science 1 Administriva Homework 4b due Thursday,
More informationComputer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis
Computer Science 210 Data Structures Siena College Fall 2017 Topic Notes: Complexity and Asymptotic Analysis Consider the abstract data type, the Vector or ArrayList. This structure affords us the opportunity
More information1. Many Core vs Multi Core. 2. Performance Optimization Concepts for Many Core. 3. Performance Optimization Strategy for Many Core
1. Many Core vs Multi Core 2. Performance Optimization Concepts for Many Core 3. Performance Optimization Strategy for Many Core 4. Example Case Studies NERSC s Cori will begin to transition the workload
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More informationUsing ODHeuristics To Solve Hard Mixed Integer Programming Problems. Alkis Vazacopoulos Robert Ashford Optimization Direct Inc.
Using ODHeuristics To Solve Hard Mixed Integer Programming Problems Alkis Vazacopoulos Robert Ashford Optimization Direct Inc. February 2017 Summary Challenges of Large Scale Optimization Exploiting parallel
More informationCourse Overview CSCE 312. Instructor: Daniel A. Jiménez. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition
Course Overview CSCE 312 Instructor: Daniel A. Jiménez 1 Overview Course theme Five realities How the course fits into the CS/ECE curriculum Academic integrity 2 Course Theme: Abstraction Is Good But Don
More informationTechnical Documentation Version 7.4. Performance
Technical Documentation Version 7.4 These documents are copyrighted by the Regents of the University of Colorado. No part of this document may be reproduced, stored in a retrieval system, or transmitted
More informationCS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS
CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in
More informationSpectre and Meltdown. Clifford Wolf q/talk
Spectre and Meltdown Clifford Wolf q/talk 2018-01-30 Spectre and Meltdown Spectre (CVE-2017-5753 and CVE-2017-5715) Is an architectural security bug that effects most modern processors with speculative
More informationGreat Reality #2: You ve Got to Know Assembly Does not generate random values Arithmetic operations have important mathematical properties
Overview Course Overview Course theme Five realities Computer Systems 1 2 Course Theme: Abstraction Is Good But Don t Forget Reality Most CS courses emphasize abstraction Abstract data types Asymptotic
More informationDevel::NYTProf. Perl Source Code Profiler. Tim Bunce - July 2009 Screencast available at
Devel::NYTProf Perl Source Code Profiler Tim Bunce - July 2009 Screencast available at http://blog.timbunce.org/tag/nytprof/ Devel::DProf Oldest Perl Profiler 1995 Design flaws make it practically useless
More informationProgramming Style and Optimisations - An Overview
Programming Style and Optimisations - An Overview Summary In this lesson we introduce some of the style and optimization features you may find useful to understand as a C++ Programmer. Note however this
More informationSpring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand
Cache Design Basics Nima Honarmand Storage Hierarchy Make common case fast: Common: temporal & spatial locality Fast: smaller, more expensive memory Bigger Transfers Registers More Bandwidth Controlled
More informationComputers and programming languages introduction
Computers and programming languages introduction Eugeniy E. Mikhailov The College of William & Mary Lecture 01 Eugeniy Mikhailov (W&M) Practical Computing Lecture 01 1 / 19 Class goals and structure Primary
More informationAlgorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I
Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language
More informationQUIZ Ch.6. The EAT for a two-level memory is given by:
QUIZ Ch.6 The EAT for a two-level memory is given by: EAT = H Access C + (1-H) Access MM. Derive a similar formula for three-level memory: L1, L2 and RAM. Hint: Instead of H, we now have H 1 and H 2. Source:
More informationSchool of Computer and Information Science
School of Computer and Information Science CIS Research Placement Report Multiple threads in floating-point sort operations Name: Quang Do Date: 8/6/2012 Supervisor: Grant Wigley Abstract Despite the vast
More informationIntel Advisor XE. Vectorization Optimization. Optimization Notice
Intel Advisor XE Vectorization Optimization 1 Performance is a Proven Game Changer It is driving disruptive change in multiple industries Protecting buildings from extreme events Sophisticated mechanics
More informationMaximizing Memory Performance for ANSYS Simulations
Maximizing Memory Performance for ANSYS Simulations By Alex Pickard, 2018-11-19 Memory or RAM is an important aspect of configuring computers for high performance computing (HPC) simulation work. The performance
More informationAuthor: Steve Gorman Title: Programming with the Intel architecture in the flat memory model
Author: Steve Gorman Title: Programming with the Intel architecture in the flat memory model Abstract: As the Intel architecture moves off the desktop into a variety of other computing applications, developers
More informationComputer Architecture and OS. EECS678 Lecture 2
Computer Architecture and OS EECS678 Lecture 2 1 Recap What is an OS? An intermediary between users and hardware A program that is always running A resource manager Manage resources efficiently and fairly
More informationChapter 8 Virtual Memory
Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #22 CPU Design: Pipelining to Improve Performance II 2007-8-1 Scott Beamer, Instructor CS61C L22 CPU Design : Pipelining to Improve Performance
More informationVirtual Machines and Dynamic Translation: Implementing ISAs in Software
Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application
More informationHigh-Performance and Parallel Computing
9 High-Performance and Parallel Computing 9.1 Code optimization To use resources efficiently, the time saved through optimizing code has to be weighed against the human resources required to implement
More informationCSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018
CSE 332: Data Structures & Parallelism Lecture 10:Hashing Ruth Anderson Autumn 2018 Today Dictionaries Hashing 10/19/2018 2 Motivating Hash Tables For dictionary with n key/value pairs insert find delete
More informationWebSphere Application Server Base Performance
WebSphere Application Server Base Performance ii WebSphere Application Server Base Performance Contents WebSphere Application Server Base Performance............. 1 Introduction to the WebSphere Application
More informationComputer Architecture Spring 2016
omputer Architecture Spring 2016 Lecture 09: Prefetching Shuai Wang Department of omputer Science and Technology Nanjing University Prefetching(1/3) Fetch block ahead of demand Target compulsory, capacity,
More informationDevel::NYTProf. Perl Source Code Profiler
Devel::NYTProf Perl Source Code Profiler Tim Bunce - July 2010 Devel::DProf Is Broken $ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});\n" for 1..1000' > x.pl $ perl -d:dprof x.pl $ dprofpp
More information!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?
Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!
More informationSpring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand
Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationCSE 303: Concepts and Tools for Software Development
CSE 303: Concepts and Tools for Software Development Dan Grossman Spring 2007 Lecture 19 Profiling (gprof); Linking and Libraries Dan Grossman CSE303 Spring 2007, Lecture 19 1 Where are we Already started
More informationB+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert
B+ Tree Review CSE2: Data Abstractions Lecture 10: More B Trees; Hashing Dan Grossman Spring 2010 M-ary tree with room for L data items at each leaf Order property: Subtree between keys x and y contains
More informationFile Systems. OS Overview I/O. Swap. Management. Operations CPU. Hard Drive. Management. Memory. Hard Drive. CSI3131 Topics. Structure.
File Systems I/O Management Hard Drive Management Virtual Memory Swap Memory Management Storage and I/O Introduction CSI3131 Topics Process Management Computing Systems Memory CPU Peripherals Processes
More informationAlgorithms and Architecture. William D. Gropp Mathematics and Computer Science
Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?
More informationCS241 Computer Organization Spring Principle of Locality
CS241 Computer Organization Spring 2015 Principle of Locality 4-21 2015 Outline! Optimization! Memory Hierarchy Locality temporal spatial Cache Readings: CSAPP2: Chapter 5, sections 5.1-5.6; 5.13 CSAPP2:
More informationIntroduction to Performance Tuning & Optimization Tools
Introduction to Performance Tuning & Optimization Tools a[i] a[i+1] + a[i+2] a[i+3] b[i] b[i+1] b[i+2] b[i+3] = a[i]+b[i] a[i+1]+b[i+1] a[i+2]+b[i+2] a[i+3]+b[i+3] Ian A. Cosden, Ph.D. Manager, HPC Software
More informationKey to A Successful Exadata POC
BY UMAIR MANSOOB Who Am I Oracle Certified Administrator from Oracle 7 12c Exadata Certified Implementation Specialist since 2011 Oracle Database Performance Tuning Certified Expert Oracle Business Intelligence
More informationLecture 3: Intro to parallel machines and models
Lecture 3: Intro to parallel machines and models David Bindel 1 Sep 2011 Logistics Remember: http://www.cs.cornell.edu/~bindel/class/cs5220-f11/ http://www.piazza.com/cornell/cs5220 Note: the entire class
More informationIntel Knights Landing Hardware
Intel Knights Landing Hardware TACC KNL Tutorial IXPUG Annual Meeting 2016 PRESENTED BY: John Cazes Lars Koesterke 1 Intel s Xeon Phi Architecture Leverages x86 architecture Simpler x86 cores, higher compute
More informationComputer Caches. Lab 1. Caching
Lab 1 Computer Caches Lab Objective: Caches play an important role in computational performance. Computers store memory in various caches, each with its advantages and drawbacks. We discuss the three main
More informationPerformance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs
Performance of Non-Moving Garbage Collectors Hans-J. Boehm HP Labs Why Use (Tracing) Garbage Collection to Reclaim Program Memory? Increasingly common Java, C#, Scheme, Python, ML,... gcc, w3m, emacs,
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationDeallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection
Deallocation Mechanisms User-controlled Deallocation Allocating heap space is fairly easy. But how do we deallocate heap memory no longer in use? Sometimes we may never need to deallocate! If heaps objects
More informationLecture 2: Single processor architecture and memory
Lecture 2: Single processor architecture and memory David Bindel 30 Aug 2011 Teaser What will this plot look like? for n = 100:10:1000 tic; A = []; for i = 1:n A(i,i) = 1; end times(n) = toc; end ns =
More informationComputer Organization - Overview
Computer Organization - Overview Hyunyoung Lee CSCE 312 1 Course Overview Topics: Theme Five great realities of computer systems Computer system overview Summary NOTE: Most slides are from the textbook
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 23 Virtual memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ Is a page replaces when
More informationLecture 23: Parallelism. Parallelism Idea. Parallel Programming in Java. CS 62 Fall 2015 Kim Bruce & Michael Bannister
Lecture 23: Parallelism CS 62 Fall 2015 Kim Bruce & Michael Bannister Some slides based on those from Dan Grossman, U. of Washington Parallel Programming in Java Parallelism Idea Creating a thread: 1.
More informationLecture #10 Context Switching & Performance Optimization
SPRING 2015 Integrated Technical Education Cluster At AlAmeeria E-626-A Real-Time Embedded Systems (RTES) Lecture #10 Context Switching & Performance Optimization Instructor: Dr. Ahmad El-Banna Agenda
More informationAn Introduction to ODH CPLEX. Alkis Vazacopoulos Robert Ashford Optimization Direct Inc. April 2018
An Introduction to ODH CPLEX Alkis Vazacopoulos Robert Ashford Optimization Direct Inc. April 2018 Summary New features Challenges of Large Scale Optimization The ODHeuristics approach ODHeuristics Engine
More informationSo on the survey, someone mentioned they wanted to work on heaps, and someone else mentioned they wanted to work on balanced binary search trees.
So on the survey, someone mentioned they wanted to work on heaps, and someone else mentioned they wanted to work on balanced binary search trees. According to the 161 schedule, heaps were last week, hashing
More informationMulti-threading technology and the challenges of meeting performance and power consumption demands for mobile applications
Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications September 2013 Navigating between ever-higher performance targets and strict limits
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationHow to speed up a database which has gotten slow
Triad Area, NC USA E-mail: info@geniusone.com Web: http://geniusone.com How to speed up a database which has gotten slow hardware OS database parameters Blob fields Indices table design / table contents
More informationCache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Cache Optimization Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Cache Misses On cache hit CPU proceeds normally On cache miss Stall the CPU pipeline
More informationOptimisation. CS7GV3 Real-time Rendering
Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that
More informationHow to Optimize the Scalability & Performance of a Multi-Core Operating System. Architecting a Scalable Real-Time Application on an SMP Platform
How to Optimize the Scalability & Performance of a Multi-Core Operating System Architecting a Scalable Real-Time Application on an SMP Platform Overview W hen upgrading your hardware platform to a newer
More informationDatabasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh
Databasesystemer, forår 2005 IT Universitetet i København Forelæsning 8: Database effektivitet. 31. marts 2005 Forelæser: Rasmus Pagh Today s lecture Database efficiency Indexing Schema tuning 1 Database
More informationIntroduction to Computer Systems
CSCE 230J Computer Organization Introduction to Computer Systems Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce230j Giving credit where credit is due Most of slides for
More informationIntroduction to Computer Systems
CSCE 230J Computer Organization Introduction to Computer Systems Dr. Steve Goddard goddard@cse.unl.edu Giving credit where credit is due Most of slides for this lecture are based on slides created by Drs.
More informationCode Optimization & Performance. CS528 Serial Code Optimization. Great Reality There s more to performance than asymptotic complexity
CS528 Serial Code Optimization Dept of CSE, IIT Guwahati 1 Code Optimization & Performance Machine independent opt Code motion Reduction in strength Common subexpression Elimination Tuning: Identifying
More informationDIVIDE & CONQUER. Problem of size n. Solution to sub problem 1
DIVIDE & CONQUER Definition: Divide & conquer is a general algorithm design strategy with a general plan as follows: 1. DIVIDE: A problem s instance is divided into several smaller instances of the same
More informationMain Memory Supporting Caches
Main Memory Supporting Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width clocked bus Bus clock is typically slower than CPU clock Cache Issues 1 Example cache block read
More informationHardware Speculation Support
Hardware Speculation Support Conditional instructions Most common form is conditional move BNEZ R1, L ;if MOV R2, R3 ;then CMOVZ R2,R3, R1 L: ;else Other variants conditional loads and stores nullification
More informationDEDICATED SERVERS WITH EBS
DEDICATED WITH EBS TABLE OF CONTENTS WHY CHOOSE A DEDICATED SERVER? 3 DEDICATED WITH EBS 4 INTEL ATOM DEDICATED 5 AMD OPTERON DEDICATED 6 INTEL XEON DEDICATED 7 MANAGED SERVICES 8 SERVICE GUARANTEES 9
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationTCP so far Computer Networking Outline. How Was TCP Able to Evolve
TCP so far 15-441 15-441 Computer Networking 15-641 Lecture 14: TCP Performance & Future Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 Reliable byte stream protocol Connection establishments
More informationProgramming with MPI
Programming with MPI p. 1/?? Programming with MPI Miscellaneous Guidelines Nick Maclaren Computing Service nmm1@cam.ac.uk, ext. 34761 March 2010 Programming with MPI p. 2/?? Summary This is a miscellaneous
More informationCSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012
CSE2: Data Abstractions Lecture 7: B Trees James Fogarty Winter 20 The Dictionary (a.k.a. Map) ADT Data: Set of (key, value) pairs keys must be comparable insert(jfogarty,.) Operations: insert(key,value)
More information(Refer Slide Time: 01.26)
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture # 22 Why Sorting? Today we are going to be looking at sorting.
More informationThe Software Design Process. CSCE 315 Programming Studio, Fall 2017 Tanzir Ahmed
The Software Design Process CSCE 315 Programming Studio, Fall 2017 Tanzir Ahmed Outline Challenges in Design Design Concepts Heuristics Practices Challenges in Design A problem that can only be defined
More informationComputer Organization: A Programmer's Perspective
Profiling Oren Kapah orenkapah.ac@gmail.com Profiling: Performance Analysis Performance Analysis ( Profiling ) Understanding the run-time behavior of programs What parts are executed, when, for how long
More informationReal-Time Testing in a Modern, Agile Development Workflow
Real-Time Testing in a Modern, Agile Development Workflow Simon Eriksson Application Engineer 2015 The MathWorks, Inc. 1 Demo Going from Desktop Testing to Real-Time Testing 2 Key Take-Aways From This
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 12 Caches I 2014-09-26 Instructor: Miki Lustig September 23: Another type of Cache PayPal Integrates Bitcoin Processors BitPay, Coinbase
More informationMultithreaded Parallelism and Performance Measures
Multithreaded Parallelism and Performance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101
More informationLecture 5. Other Adder Issues
Lecture 5 Other Adder Issues Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 24 by Mark Horowitz with information from Brucek Khailany 1 Overview Reading There
More informationUNIT III BALANCED SEARCH TREES AND INDEXING
UNIT III BALANCED SEARCH TREES AND INDEXING OBJECTIVE The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in constant
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationCache Memory and Performance
Cache Memory and Performance Cache Performance 1 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP)
More informationCSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019
CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting Ruth Anderson Winter 2019 Today Sorting Comparison sorting 2/08/2019 2 Introduction to sorting Stacks, queues, priority queues, and
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationImplementing a Statically Adaptive Software RAID System
Implementing a Statically Adaptive Software RAID System Matt McCormick mattmcc@cs.wisc.edu Master s Project Report Computer Sciences Department University of Wisconsin Madison Abstract Current RAID systems
More informationOptimized Scientific Computing:
Optimized Scientific Computing: Coding Efficiently for Real Computing Architectures Noah Kurinsky SASS Talk, November 11 2015 Introduction Components of a CPU Architecture Design Choices Why Is This Relevant
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 12 Caches I Lecturer SOE Dan Garcia Midterm exam in 3 weeks! A Mountain View startup promises to do Dropbox one better. 10GB free storage,
More informationPage 1. Goals for Today" Virtualizing Resources" Important Aspects of Memory Multiplexing" CS162 Operating Systems and Systems Programming Lecture 20
Goals for Today" CS162 Operating Systems and Systems Programming Lecture 20 Address Translation" November 7, 2011 Anthony D. Joseph and Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Address Translation
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationWelcome to PR3 The Art of Optimization
Lecture 2 February 11th, 2014 IGAD Hopmanstraat, Breda > Recap > Demo Time > Basic Optimizations > Fixed Point Math Primer > Coding Time Lecture 2 February 11th, 2014 IGAD Hopmanstraat, Breda + Don t Optimize
More informationI/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)
I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 L17 Main Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ Was Great Dijkstra a magician?
More informationIntroduction to OpenMP
Introduction to OpenMP Lecture 9: Performance tuning Sources of overhead There are 6 main causes of poor performance in shared memory parallel programs: sequential code communication load imbalance synchronisation
More informationJackson Marusarz Intel Corporation
Jackson Marusarz Intel Corporation Intel VTune Amplifier Quick Introduction Get the Data You Need Hotspot (Statistical call tree), Call counts (Statistical) Thread Profiling Concurrency and Lock & Waits
More informationDifference Engine: Harnessing Memory Redundancy in Virtual Machines (D. Gupta et all) Presented by: Konrad Go uchowski
Difference Engine: Harnessing Memory Redundancy in Virtual Machines (D. Gupta et all) Presented by: Konrad Go uchowski What is Virtual machine monitor (VMM)? Guest OS Guest OS Guest OS Virtual machine
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationComputer Architecture. Lecture 8: Virtual Memory
Computer Architecture Lecture 8: Virtual Memory Dr. Ahmed Sallam Suez Canal University Spring 2015 Based on original slides by Prof. Onur Mutlu Memory (Programmer s View) 2 Ideal Memory Zero access time
More informationECE 571 Advanced Microprocessor-Based Design Lecture 12
ECE 571 Advanced Microprocessor-Based Design Lecture 12 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 March 2018 HW#6 will be posted Project will be coming up Announcements
More informationMap3D V58 - Multi-Processor Version
Map3D V58 - Multi-Processor Version Announcing the multi-processor version of Map3D. How fast would you like to go? 2x, 4x, 6x? - it's now up to you. In order to achieve these performance gains it is necessary
More informationWe will give examples for each of the following commonly used algorithm design techniques:
Review This set of notes provides a quick review about what should have been learned in the prerequisite courses. The review is helpful to those who have come from a different background; or to those who
More informationIntroduction to Algorithms
Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that
More information