Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms

Similar documents
Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms

Energy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors

Reducing Power Consumption in Data Centers by Jointly Considering VM Placement and Flow Scheduling

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Fault-Tolerant and Secure Data Transmission Using Random Linear Network Coding

Towards Deadline Guaranteed Cloud Storage Services Guoxin Liu, Haiying Shen, and Lei Yu

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows

Spectral Graph Multisection Through Orthogonality. Huanyang Zheng and Jie Wu CIS Department, Temple University

Complexity results for throughput and latency optimization of replicated and data-parallel workflows

Coalition Formation towards Energy-Efficient Collaborative Mobile Computing. Liyao Xiang, Baochun Li, Bo Li Aug. 3, 2015

Mutually Exclusive Data Dissemination in the Mobile Publish/Subscribe System

EECS 571 Principles of Real-Time Embedded Systems. Lecture Note #8: Task Assignment and Scheduling on Multiprocessor Systems

Machine Learning for Software Engineering

CSE 417 Branch & Bound (pt 4) Branch & Bound

Kyle M. Tarplee 1, Ryan Friese 1, Anthony A. Maciejewski 1, H.J. Siegel 1,2. Department of Electrical and Computer Engineering 2

Mapping pipeline skeletons onto heterogeneous platforms

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters

Multi-resource Energy-efficient Routing in Cloud Data Centers with Network-as-a-Service

Introduction to Mathematical Programming IE406. Lecture 20. Dr. Ted Ralphs

Search Algorithms for Discrete Optimization Problems

Towards Green Cloud Computing: Demand Allocation and Pricing Policies for Cloud Service Brokerage Chenxi Qiu

CHAPTER 6 ENERGY AWARE SCHEDULING ALGORITHMS IN CLOUD ENVIRONMENT

Clustering Analysis for Malicious Network Traffic

Abhishek Pandey Aman Chadha Aditya Prakash

Hardware/Software Codesign

QoS-Aware Service Selection in Geographically Distributed Clouds

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

The Round Complexity of Distributed Sorting

A List Heuristic for Vertex Cover

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Deadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen

Minimum Weight Constrained Forest Problems. Problem Definition

Venice: Reliable Virtual Data Center Embedding in Clouds

Joint Power Optimization Through VM Placement and Flow Scheduling in Data Centers

On Cluster Resource Allocation for Multiple Parallel Task Graphs

Dynamic Grouping Strategy in Cloud Computing

Randomized Algorithms: Selection

Piccolo. Fast, Distributed Programs with Partitioned Tables. Presenter: Wu, Weiyi Yale University. Saturday, October 15,

OPTIMIZED TASK ALLOCATION IN SENSOR NETWORKS

Outsourcing Privacy-Preserving Social Networks to a Cloud

Research Incubator: Combinatorial Optimization. Dr. Lixin Tao December 9, 2003

EE/CSCI 451: Parallel and Distributed Computation

Introduction to Machine Learning

Virtual Network Embedding with Substrate Support for Parallelization

Lecture 18.1 Multi-Robot Path Planning (3)

PRIORITY-BASED BROADCASTING OF SENSITIVE DATA IN ERROR-PRONE WIRELESS NETWORKS

Toward the joint design of electronic and optical layer protection

Foundations of Artificial Intelligence

Extending SLURM with Support for GPU Ranges

Branch-and-bound: an example

Set Cover with Almost Consecutive Ones Property

Principles of Parallel Algorithm Design: Concurrency and Mapping

Hardware-Software Codesign

Minimizing Clock Domain Crossing in Network on Chip Interconnect

Lecture 9: Load Balancing & Resource Allocation

6 ROUTING PROBLEMS VEHICLE ROUTING PROBLEMS. Vehicle Routing Problem, VRP:

Code generation for modern processors

Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications

6 Randomized rounding of semidefinite programs

L9: Hierarchical Clustering

Guaranteeing Heterogeneous Bandwidth Demand in Multitenant Data Center Networks

EE382V: System-on-a-Chip (SoC) Design

DISTRIBUTED NETWORK RESOURCE ALLOCATION WITH INTEGER CONSTRAINTS. Yujiao Cheng, Houfeng Huang, Gang Wu, Qing Ling

Search Algorithms for Discrete Optimization Problems

A Randomized Algorithm for Minimizing User Disturbance Due to Changes in Cellular Technology

Register Allocation. Stanford University CS243 Winter 2006 Wei Li 1

Improving Packing Algorithms for Server Consolidation

Financial Optimization ISE 347/447. Lecture 13. Dr. Ted Ralphs

Energy Efficient Big Data Processing at the Software Level

Energy-efficient acceleration of task dependency trees on CPU-GPU hybrids

High-Level Synthesis

11. APPROXIMATION ALGORITHMS

Asymmetry-Aware Work-Stealing Runtimes

Partitioning and Partitioning Tools. Tim Barth NASA Ames Research Center Moffett Field, California USA

COMP9334: Capacity Planning of Computer Systems and Networks

A Multi-Modal Composability Framework for Cyber-Physical Systems

Computational Geometry

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Lecture outline. Graph coloring Examples Applications Algorithms

Histogram sort with Sampling (HSS) Vipul Harsh, Laxmikant Kale

Solved by: Syed Zain Ali Bukhari (BSCS)

Column Generation and its applications

Sorting Pearson Education, Inc. All rights reserved.

Solutions to Problem Set 1

Towards Makespan Minimization Task Allocation in Data Centers

Column Generation: Cutting Stock

Outline. Column Generation: Cutting Stock A very applied method. Introduction to Column Generation. Given an LP problem

Department of Electrical and Computer Engineering

Parallel FFT Program Optimizations on Heterogeneous Computers

Fundamentals of Integer Programming

I211: Information infrastructure II

Retiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming.

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

A Parallel Interest Matching Algorithm for Distributed- Memory Systems. Elvis Liu Georgios Theodoropoulos

Energy Efficient in Cloud Computing

Construction of Minimum-Weight Spanners Mikkel Sigurd Martin Zachariasen

Introduction to Parallel Computing

Strategies for Replica Placement in Tree Networks

LB-MAP: LOAD-BALANCED MIDDLEBOX ASSIGNMENT IN POLICY-DRIVEN DATA CENTERS

Transcription:

Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms Dawei Li and Jie Wu Temple University, Philadelphia, PA, USA 2012-9-13 ICPP 2012 1

Outline 1. Introduction 2. System Model 3. Relaxation-based Iterative Rounding Algorithm 4. Simulation 5. Conclusion 2

Introduction Dynamic Voltage and Frequency Scaling (DVFS) Motivational Example 3

Dynamic Voltage and Frequency Scaling (DVFS) o Background: o high performance lead to high energy consumption. odvfs allows processors to adjust othe supply voltage or othe execution frequency to operate on different power/energy levels. oit is considered an effective way to achieve the goal of saving energy 4

Motivational Example 5

Motivational Example 6

Motivational Example o Existing partitioning approaches: o Min-min: In the first phase, the set of tasks minimum expected completion times is calculated (for all unassigned tasks). In the second phase, the task with the over-all minimum expected completion time in the set (derived by the first phase) is chosen and assigned to the corresponding processor. Then, this task is removed from the unassigned task set, and the procedure is repeated until all tasks are assigned. omax-min: This heuristic is very similar to the min-min heuristic. The only difference is that, in the second phase, the task with the overall maximum expected completion time among all of the unassigned tasks is chosen and assigned to the corresponding processor. 7

Motivational Example o Partition by min-min omin(30, 12, 15, 10) o Partition by max-min omax(30, 12, 15, 10) o These two approaches are commonly Used to achieve workload balanced partition. Does a workload balanced partition really result in less energy consumption? 8

Motivational Example 9

Motivational Example 10

Outline 1. Introduction 2. System Model 3. Relaxation-based Iterative Rounding Algorithm 4. Simulation 5. Conclusion 2012-9-13 11

System Model Task Model Platform Model Problem Definition 12

Task Model 13

Platform Models 14

Platform Models o Three platform types odependent Platforms without Runtime Adjusting all of the processors must operate at a same frequency, and the same execution frequency cannot be adjusted during runtime. odependent Platforms with Runtime Adjusting all the processors must operate at a shared frequency, and the shared frequency can be adjusted during runtime. oindependent Platforms processors can operate at different frequencies at any time, and can adjust their execution frequencies independently 15

Problem Definition 16

Problem Definition 17

Outline 1. Introduction 2. System Model 3. Relaxation-based Iterative Rounding Algorithm 4. Simulation 5. Conclusion 18

Relaxation-based Iterative Rounding Algorithm (RIRA) Scheduling on Dependent Platforms without Runtime Adjusting Scheduling on Dependent Platforms with Runtime Adjusting Scheduling on Independent Platforms 19

Scheduling on Dependent Platforms without Runtime Adjusting 20

Scheduling on Dependent Platforms without Runtime Adjusting o Formulation: to achieve a partition with the objective of saving energy, the problem can be formulated as: o This binary integer programming problem is NP-hard. o Relaxation: (allow splitting the tasks arbitrarily to achieve the ideal optimal solution.) 21

Scheduling on Dependent Platforms without Runtime Adjusting 22

Scheduling on Dependent Platforms without Runtime Adjusting 23

Scheduling on Dependent Platforms oexample: without Runtime Adjusting 24

Scheduling on Dependent Platforms without Runtime Adjusting o After Partition o The shared frequency is set such that the processor with the greatest workload finishes its tasks exactly at the deadline. oenergy Consumption Comparisons: omin-min heuristic 11.3 omax-min heuristic 10.7 ornra 8.464 orira 8.08 25

Scheduling on Dependent Platforms with Runtime Adjusting oadopt the same partition as that for Dependent Platforms without Runtime Adjusting. oafter partition. owe can determine the optimal frequency setting for each time interval. (sort in ascending order) 26

Scheduling on Dependent Platforms with Runtime Adjusting oenergy Consumption Comparisons omin-min heuristic 10.3375, omax-min heuristic 10.4740, ornra 8.1617, orira 7.8776. 2012-9-12 ICPP 2012 27

Scheduling on Independent Platforms o Problem formulation and relaxation oassign the most influential task, then update the problem. 28

Scheduling on Independent Platforms o Partition by RIRA. o Energy consumption comparisons omin-min heuristic 7.11 omax-min heuristic 8.92 ornra: 6.14. orira: 5.84. 29

Outline 1. Introduction 2. System Model 3. Relaxation-based Iterative Rounding Algorithm 4. Simulations 5. Conclusion 30

Simulations o Settings: oscheduling 20 tasks to 6 processors. ofor each platform type, we randomly generate 20 processor efficiency matrix. ofor each efficiency matrix, we randomly generate 20 sets of task execution. oon a given type of platform, for a given processor efficiency matrix, we compare the average normalized energy consumption (normalized by the ideal optimal solution) of the 20 randomly generated tasks. 31

Simulations 32

Simulations 33

Simulations 34

Outline 1. Introduction 2. System Model 3. Relaxation-based Iterative Rounding Algorithm 4. Simulations 5. Conclusion 35

Conclusion o A Relaxation-based Iterative Rounding Algorithm (RIRA) is proposed for energy-aware task allocating and scheduling. othree typical platform types are considered. Our proposed RIRA is suitable for all of the three platform types. oexperiments and simulations verify the strength of our approach. 2012-9-12 ICPP 2012 36

Thank you! Questions? dawei.li@temple.edu 37