Using Bad Learners to find Good Configurations

Size: px
Start display at page:

Download "Using Bad Learners to find Good Configurations"

Transcription

1 Using Bad Learners to find Good Configurations Vivek Nair Tim Menzies North Carolina State University Norbert Siegmund Sven Apel Bauhaus-Universität Weimar University of Passau

2 Configurable Systems and Variability System

3 Configurable Systems and Variability t_2k_480p24.y4m System

4 Configurable Systems and Variability t_2k_480p24.y4m System trailer_480p24.x264

5 Configurable Systems and Variability Configuration options c1 c2 c3 c4 t_2k_480p24.y4m... System cn trailer_480p24.x264

6 Configurable Systems and Variability Configuration options c1 c2 c3 c4 t_2k_480p24.y4m... System cn trailer_480p24.x264 Non-functional behavior: response time, throughput, etc.

7 Configurable Systems and Variability Configuration options c1 c2 c3 c4 t_2k_480p24.y4m... System cn trailer_480p24.x264 Non-functional behavior: response time, throughput, etc.

8 Performance Optimization is Necessary!

9 Performance Optimization is getting more Complex! Necessary [1] Xu et. al Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software.fse 2015 [2] Van Aken, Dana, et al. "Automatic Database Management System Tuning Through Large-scale Machine Learning." International Conference on Management of Data. ACM, 2017.

10 Performance Optimization is required since Default Configuration is Bad! Necessary Complex [1] Van Aken, Dana, et al. "Automatic Database Management System Tuning Through Large-scale Machine Learning." International Conference on Management of Data. ACM, [2] Herodotou, Herodotos, et al. "Starfish: A Self-tuning System for Big Data Analytics." CIDR

11 Performance Optimization can be Expensive! Evaluation of single instance of their hardware/software co-design problem can take weeks[1] Necessary Complex Default is bad Rolling Sort use-case required 21 days, within a total experimental time of about 2.5 months[2] Test suite generation using Evolutionary Algorithm can take weeks[3] Image recognition workload and speech recognition workload, jobs ran for many hours or days[4] [1] Zuluaga, Marcela, et al. "Active learning for multi-objective optimization." International Conference on Machine Learning [2] Jamshidi, Pooyan, and Giuliano Casale. "An uncertainty-aware approach to optimal configuration of stream processing systems."mascots-2016 [3] Wang, Tiantian, et al. "Searching for better configurations: a rigorous approach to clone evaluation." FSE-2013 [4] Venkataraman, Shivaram, et al. "Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics." NSDI

12 Existing Solutions [Guo 13] [Siegmund 12] [Sarkar 15] [Nair 17] Configuration Configuration F1= True F2= False F3= True Response Time = 2100 ms Request F1= True F2= False F3= True Response Time ~ 2100 ms Request Software System [Siegmund 12] Siegmund, Norbert, et al. "Predicting performance via automated feature-interaction detection." ICSE [Guo 13] Guo, Jienmei, et al. Variability-aware performance prediction: A statistical learning approach. ASE-2013 [Sarkar 15] Sarkar, Atri, et al. "Cost-efficient sampling for performance prediction of configurable systems (t)." ASE-2015 [Nair 17] Nair, Vivek, et al. "Faster discovery of faster system configurations with spectral learning." ASE Journal to appear. Surrogate System

13 Existing Solutions 2012 [Siegmund 12] 2013 [Guo 13] 2015 [Sarkar 15] Configuration Space [Siegmund 12] Siegmund, Norbert, et al. "Predicting performance via automated feature-interaction detection." ICSE [Guo 13] Guo, Jienmei, et al. Variability-aware performance prediction: A statistical learning approach. ASE-2013 [Sarkar 15] Sarkar, Atri, et al. "Cost-efficient sampling for performance prediction of configurable systems (t)." ASE-2015 [Nair 17] Nair, Vivek, et al. "Faster discovery of faster system configurations with spectral learning." ASE Journal to appear [Nair 17]

14 Existing Solutions 2012 [Siegmund 12] 2013 [Guo 13] [Sarkar 15] 1. [Nair 17] Divide the configuration space into training and testing sets; Train 1 Test Configuration Space [Siegmund 12] Siegmund, Norbert, et al. "Predicting performance via automated feature-interaction detection." ICSE [Guo 13] Guo, Jienmei, et al. Variability-aware performance prediction: A statistical learning approach. ASE-2013 [Sarkar 15] Sarkar, Atri, et al. "Cost-efficient sampling for performance prediction of configurable systems (t)." ASE-2015 [Nair 17] Nair, Vivek, et al. "Faster discovery of faster system configurations with spectral learning." ASE Journal to appear.

15 Existing Solutions [Siegmund 12] [Guo 13] [Sarkar 15] [Nair 17] Divide the configuration space into training and testing sets; Measure all the configurations in the testing set; Train 1 Test 2 Configuration Space [Siegmund 12] Siegmund, Norbert, et al. "Predicting performance via automated feature-interaction detection." ICSE [Guo 13] Guo, Jienmei, et al. Variability-aware performance prediction: A statistical learning approach. ASE-2013 [Sarkar 15] Sarkar, Atri, et al. "Cost-efficient sampling for performance prediction of configurable systems (t)." ASE-2015 [Nair 17] Nair, Vivek, et al. "Faster discovery of faster system configurations with spectral learning." ASE Journal to appear.

16 Existing Solutions [Siegmund 12] [Sarkar 15] [Guo 13] Train [Nair 17] Divide the configuration space into training and testing sets; Measure all the configurations in the testing set; Iteratively sampling configuration from training set to build a model and test the model against testing set; Test 2 Configuration Space [Siegmund 12] Siegmund, Norbert, et al. "Predicting performance via automated feature-interaction detection." ICSE [Guo 13] Guo, Jienmei, et al. Variability-aware performance prediction: A statistical learning approach. ASE-2013 [Sarkar 15] Sarkar, Atri, et al. "Cost-efficient sampling for performance prediction of configurable systems (t)." ASE-2015 [Nair 17] Nair, Vivek, et al. "Faster discovery of faster system configurations with spectral learning." ASE Journal to appear.

17 Existing Solutions [Siegmund 12] [Sarkar 15] [Guo 13] Train 1 4 Test [Nair 17] Divide the configuration space into training and testing sets; Measure all the configurations in the testing set; Iteratively sampling configuration from training set to build a model and test the model against testing set; Exit when an accurate model is built (e.g., error = 0.1) 2 Configuration Space [Siegmund 12] Siegmund, Norbert, et al. "Predicting performance via automated feature-interaction detection." ICSE [Guo 13] Guo, Jienmei, et al. Variability-aware performance prediction: A statistical learning approach. ASE-2013 [Sarkar 15] Sarkar, Atri, et al. "Cost-efficient sampling for performance prediction of configurable systems (t)." ASE-2015 [Nair 17] Nair, Vivek, et al. "Faster discovery of faster system configurations with spectral learning." ASE Journal to appear.

18 Limitation of Existing Solutions

19 Limitation of Existing Solutions

20 Limitation of Existing Solutions

21 Limitation of Existing Solutions

22 Limitation of Existing Solutions

23 Limitation of Existing Solutions

24 Core Insight

25 Rank Preserving Model

26 Rank Preserving Model

27 Rank Preserving Model

28 Rank Preserving Model

29 Rank Preserving Model

30 Rank Preserving Model

31 Rank Preserving Model

32 Rank Preserving Model Train 1 Test 2 Configuration Space Divide the configuration space into training and testing sets; Measure all the configurations in the testing set; Iteratively sampling configuration from training set to build a model and test the model against testing set;

33 Rank Preserving Model Train 1 4 Test 2 Configuration Space 4. Divide the configuration space into training and testing sets; Measure all the configurations in the testing set; Iteratively sampling configuration from training set to build a model and test the model against testing set; Calculate accuracy model should get progressively more accurate

34 Rank Preserving Model Train 1 4 Test 2 Configuration Space Divide the configuration space into training and testing sets; Measure all the configurations in the testing set; Iteratively sampling configuration from training set to build a model and test the model against testing set; Calculate accuracy model should get progressively more accurate Exit when a model built does not improve (accuracy plateau)

35 Evaluation

36 Baselines Progressive Sampling[1] Sequentially (randomly) sample configuration to build a decision tree till threshold accuracy is reached Projective Sampling[2] Using minimal set of initial sample configurations to project the sampling cost based on a threshold accuracy [1] Guo, Jianmei, et al. Variability-aware performance prediction: A statistical learning approach. ASE-2013 [2] Sarkar, Atri, et al. "Cost-efficient sampling for performance prediction of configurable systems (t)." ASE-2015

37 Research Questions RQ1 - Can inaccurate models accurately rank configurations? RQ2 - How expensive is a rank-based method?

38 Subject Software Systems Video Encoder Databases Utility Compression Numerical Grid benchmark Web server Data processing

39 Subject Software Systems Pooyan Jamshidi Sven Apel Norbert Siegmund Combined effort = 6 computational months

40 Experimental Settings Data (100)

41 Experimental Settings Data (100) Training Pool (40) Prune Pool (20) Testing Pool (40)

42 Experimental Settings Data (100) Training Pool (40) Prune Pool (20) Testing Pool (40)

43 Results

44 RQ1: Can inaccurate models accurately rank configurations?

45 RQ1: Can inaccurate models accurately rank configurations?

46 RQ1: Can inaccurate models accurately rank configurations?

47 RQ1: Can inaccurate models accurately rank configurations? The lower the better

48 RQ1: Can inaccurate models accurately rank configurations? The lower the better

49 RQ1: Can inaccurate models accurately rank configurations? 6 predictedoptima l 1 actualoptimal The lower the better

50 RQ1: Can inaccurate models accurately rank configurations? RD = 1-6 =5 6 predictedoptimal 1 actualoptimal The lower the better

51 RQ1: Can inaccurate models accurately rank configurations?

52 RQ1: Can inaccurate models accurately rank configurations?

53 RQ1: Can inaccurate models accurately rank configurations? 8/1535

54 RQ1: Can inaccurate models accurately rank configurations?

55 RQ2: How expensive is a rank-based approach?

56 RQ2: How expensive is a rank-based approach? The lower the better

57 RQ2: How expensive is a rank-based approach?

58 RQ2: How expensive is a rank-based approach?

59 Conclusion

60

arxiv: v2 [cs.se] 28 Jun 2017

arxiv: v2 [cs.se] 28 Jun 2017 arxiv:1702.05701v2 [cs.se] 28 Jun 2017 ABSTRACT Vivek Nair North Carolina State University Raleigh, North Carolina, USA Norbert Siegmund Bauhaus-University Weimar, Germany Finding the optimally performing

More information

Why CART Works for Variability-Aware Performance Prediction? An Empirical Study on Performance Distributions

Why CART Works for Variability-Aware Performance Prediction? An Empirical Study on Performance Distributions GSDLAB TECHNICAL REPORT Why CART Works for Variability-Aware Performance Prediction? An Empirical Study on Performance Distributions Jianmei Guo, Krzysztof Czarnecki, Sven Apel, Norbert Siegmund, Andrzej

More information

Challenges and Insights from Optimizing Configurable Software Systems

Challenges and Insights from Optimizing Configurable Software Systems Challenges and Insights from Optimizing Configurable Software Systems Norbert Siegmund 2/6/2019 A Joint-Research Endeavor Christian Kästner Pooyan Jamshidi Miguel Velez Sven Apel Alexander Grebhahn Christian

More information

Scaling Exact Multi Objective Combinatorial Optimization by Parallelization

Scaling Exact Multi Objective Combinatorial Optimization by Parallelization Scaling Exact Multi Objective Combinatorial Optimization by Parallelization Jianmei Guo, Edward Zulkoski, Rafael Olaechea, Derek Rayside, Krzysztof Czarnecki, Sven Apel, Joanne M. Atlee Multi Objective

More information

Meta-learning Performance Prediction of Highly Configurable Systems: A Cost-oriented Approach

Meta-learning Performance Prediction of Highly Configurable Systems: A Cost-oriented Approach Meta-learning Performance Prediction of Highly Configurable Systems: A Cost-oriented Approach by Atri Sarkar A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for

More information

Focus: Querying Large Video Datasets with Low Latency and Low Cost

Focus: Querying Large Video Datasets with Low Latency and Low Cost Focus: Querying Large Video Datasets with Low Latency and Low Cost Kevin Hsieh Ganesh Ananthanarayanan, Peter Bodik, Shivaram Venkataraman, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, Onur Mutlu

More information

Research Works to Cope with Big Data Volume and Variety. Jiaheng Lu University of Helsinki, Finland

Research Works to Cope with Big Data Volume and Variety. Jiaheng Lu University of Helsinki, Finland Research Works to Cope with Big Data Volume and Variety Jiaheng Lu University of Helsinki, Finland Big Data: 4Vs Photo downloaded from: https://blog.infodiagram.com/2014/04/visualizing-big-data-concepts-strong.html

More information

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) Automatic Speech Recognition (ASR) February 2018 Reza Yazdani Aminabadi Universitat Politecnica de Catalunya (UPC) State-of-the-art State-of-the-art ASR system: DNN+HMM Speech (words) Sound Signal Graph

More information

OtterTune. Automatic Database Management System Tuning Through Large-scale Machine Learning

OtterTune. Automatic Database Management System Tuning Through Large-scale Machine Learning OtterTune Automatic Database Management System Tuning Through Large-scale Machine Learning Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, Bohan Zhang [image source] 2 DBMS Tuning Tuning a DBMS s configuration

More information

Bohr: Similarity Aware Geo-distributed Data Analytics. Hangyu Li, Hong Xu, Sarana Nutanong City University of Hong Kong

Bohr: Similarity Aware Geo-distributed Data Analytics. Hangyu Li, Hong Xu, Sarana Nutanong City University of Hong Kong Bohr: Similarity Aware Geo-distributed Data Analytics Hangyu Li, Hong Xu, Sarana Nutanong City University of Hong Kong 1 Big Data Analytics Analysis Generate 2 Data are geo-distributed Frankfurt US Oregon

More information

SAP HANA Scalability. SAP HANA Development Team

SAP HANA Scalability. SAP HANA Development Team SAP HANA Scalability Design for scalability is a core SAP HANA principle. This paper explores the principles of SAP HANA s scalability, and its support for the increasing demands of data-intensive workloads.

More information

Agenda. Cassandra Internal Read/Write path.

Agenda. Cassandra Internal Read/Write path. Rafiki: A Middleware for Parameter Tuning of NoSQL Data-stores for Dynamic Metagenomics Workloads Ashraf Mahgoub, Paul Wood, Sachandhan Ganesh Purdue University Subrata Mitra Adobe Research Wolfgang Gerlach,

More information

Self Programming Networks

Self Programming Networks Self Programming Networks Is it possible for to Learn the control planes of networks and applications? Operators specify what they want, and the system learns how to deliver CAN WE LEARN THE CONTROL PLANE

More information

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in versions 8 and 9. that must be used to measure, evaluate,

More information

Data Driven Networks

Data Driven Networks Data Driven Networks Is it possible for to Learn the control planes of networks and applications? Operators specify what they want, and the system learns how to deliver CAN WE LEARN THE CONTROL PLANE OF

More information

Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area

Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area Ariel Rabkin Princeton University asrabkin@cs.princeton.edu Work done with Matvey Arye, Siddhartha Sen, Vivek S. Pai, and

More information

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S.

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Vazhkudai Instance of Large-Scale HPC Systems ORNL s TITAN (World

More information

Lightweight Streaming-based Runtime for Cloud Computing. Shrideep Pallickara. Community Grids Lab, Indiana University

Lightweight Streaming-based Runtime for Cloud Computing. Shrideep Pallickara. Community Grids Lab, Indiana University Lightweight Streaming-based Runtime for Cloud Computing granules Shrideep Pallickara Community Grids Lab, Indiana University A unique confluence of factors have driven the need for cloud computing DEMAND

More information

DRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE

DRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE DRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout, Michael Armbrust, Ali Ghodsi, Michael Franklin, Benjamin Recht, Ion Stoica STREAMING WORKLOADS

More information

Using Alluxio to Improve the Performance and Consistency of HDFS Clusters

Using Alluxio to Improve the Performance and Consistency of HDFS Clusters ARTICLE Using Alluxio to Improve the Performance and Consistency of HDFS Clusters Calvin Jia Software Engineer at Alluxio Learn how Alluxio is used in clusters with co-located compute and storage to improve

More information

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents

More information

Ten things hyperconvergence can do for you

Ten things hyperconvergence can do for you Ten things hyperconvergence can do for you Francis O Haire Director, Technology & Strategy DataSolutions Evolution of Enterprise Infrastructure 1990s Today Virtualization Server Server Server Server Scale-Out

More information

The Memetic Algorithm for The Minimum Spanning Tree Problem with Degree and Delay Constraints

The Memetic Algorithm for The Minimum Spanning Tree Problem with Degree and Delay Constraints The Memetic Algorithm for The Minimum Spanning Tree Problem with Degree and Delay Constraints Minying Sun*,Hua Wang* *Department of Computer Science and Technology, Shandong University, China Abstract

More information

Intelligent QoS Grid for Virtualized Workloads Gaurav Gupta Tata Consultancy Services

Intelligent QoS Grid for Virtualized Workloads Gaurav Gupta Tata Consultancy Services Intelligent Grid for ized Workloads Gaurav Gupta Tata Consultancy Services Characteristics of Data Analytics BI Image Processing Multi Media Static Content OLTP BigData NoSQL ECM Cloud IOT ERP Web 2.0

More information

Table of Contents 1 Introduction A Declarative Approach to Entity Resolution... 17

Table of Contents 1 Introduction A Declarative Approach to Entity Resolution... 17 Table of Contents 1 Introduction...1 1.1 Common Problem...1 1.2 Data Integration and Data Management...3 1.2.1 Information Quality Overview...3 1.2.2 Customer Data Integration...4 1.2.3 Data Management...8

More information

A REVIEW PAPER ON BIG DATA ANALYTICS

A REVIEW PAPER ON BIG DATA ANALYTICS A REVIEW PAPER ON BIG DATA ANALYTICS Kirti Bhatia 1, Lalit 2 1 HOD, Department of Computer Science, SKITM Bahadurgarh Haryana, India bhatia.kirti.it@gmail.com 2 M Tech 4th sem SKITM Bahadurgarh, Haryana,

More information

Online Optimization of VM Deployment in IaaS Cloud

Online Optimization of VM Deployment in IaaS Cloud Online Optimization of VM Deployment in IaaS Cloud Pei Fan, Zhenbang Chen, Ji Wang School of Computer Science National University of Defense Technology Changsha, 4173, P.R.China {peifan,zbchen}@nudt.edu.cn,

More information

PREDICTIVE DATACENTER ANALYTICS WITH STRYMON

PREDICTIVE DATACENTER ANALYTICS WITH STRYMON PREDICTIVE DATACENTER ANALYTICS WITH STRYMON Vasia Kalavri kalavriv@inf.ethz.ch QCon San Francisco 14 November 2017 Support: ABOUT ME Postdoc at ETH Zürich Systems Group: https://www.systems.ethz.ch/ PMC

More information

Data Driven Networks. Sachin Katti

Data Driven Networks. Sachin Katti Data Driven Networks Sachin Katti Is it possible for to Learn the control planes of networks and applications? Operators specify what they want, and the system learns how to deliver CAN WE LEARN THE CONTROL

More information

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Implementation of Data Mining for Vehicle Theft Detection using Android Application

Implementation of Data Mining for Vehicle Theft Detection using Android Application Implementation of Data Mining for Vehicle Theft Detection using Android Application Sandesh Sharma 1, Praneetrao Maddili 2, Prajakta Bankar 3, Rahul Kamble 4 and L. A. Deshpande 5 1 Student, Department

More information

Intelligent QoS Grid for Virtualized Workloads

Intelligent QoS Grid for Virtualized Workloads Intelligent Grid for ized Workloads Gaurav Gupta Delivery Head, HiTech Industry Solution Unit Tata Consultancy Services 27 May 2016 SDC India 2016 1 Copyright 2016 Tata Consultancy Services Limited Parallel

More information

Be a VDI hero with Nutanix

Be a VDI hero with Nutanix Be a VDI hero with Nutanix Francis O Haire Director, Technology & Strategy DataSolutions Evolution of Enterprise Infrastructure 1990s Today App App App App Virtualization Server Server Server Server Scale-Out

More information

CPU-GPU hybrid computing for feature extraction from video stream

CPU-GPU hybrid computing for feature extraction from video stream LETTER IEICE Electronics Express, Vol.11, No.22, 1 8 CPU-GPU hybrid computing for feature extraction from video stream Sungju Lee 1, Heegon Kim 1, Daihee Park 1, Yongwha Chung 1a), and Taikyeong Jeong

More information

IO Priority: To The Device and Beyond

IO Priority: To The Device and Beyond IO Priority: The Device and Beyond Adam Manzanares Filip Blagojevic Cyril Guyot 2017 Western Digital Corporation or its affiliates. All rights reserved. 7/24/17 1 The Relationship of Throughput and Latency

More information

REAL-TIME AND PARALLEL QUALITY CONTROL PROCESSING ON INDUSTRIAL PRODUCTION LINES

REAL-TIME AND PARALLEL QUALITY CONTROL PROCESSING ON INDUSTRIAL PRODUCTION LINES POLLACK PERIODICA An International Journal for Engineering and Information Sciences DOI: 10.1556/Pollack.3.2008.3.9 Vol. 3, No. 3, pp. 105 111 (2008) www.akademiai.com REAL-TIME AND PARALLEL QUALITY CONTROL

More information

LITERATURE SURVEY (BIG DATA ANALYTICS)!

LITERATURE SURVEY (BIG DATA ANALYTICS)! LITERATURE SURVEY (BIG DATA ANALYTICS) Applications frequently require more resources than are available on an inexpensive machine. Many organizations find themselves with business processes that no longer

More information

A What-if Engine for Cost-based MapReduce Optimization

A What-if Engine for Cost-based MapReduce Optimization A What-if Engine for Cost-based MapReduce Optimization Herodotos Herodotou Microsoft Research Shivnath Babu Duke University Abstract The Starfish project at Duke University aims to provide MapReduce users

More information

Bottleneck Identification and Scheduling in Multithreaded Applications. José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt

Bottleneck Identification and Scheduling in Multithreaded Applications. José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt Bottleneck Identification and Scheduling in Multithreaded Applications José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt Executive Summary Problem: Performance and scalability of multithreaded applications

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Resolve: Generation of High Performance Sorting Architectures from High Level Synthesis

Resolve: Generation of High Performance Sorting Architectures from High Level Synthesis Resolve: Generation of High Performance Sorting Architectures from High Level Synthesis Janarbek Matai, Dustin Richmond, Dajung Lee, Zac Blair, Qiongzhi Wu, Amin Abazari, Ryan Kastner Department of Computer

More information

Manuel F. Dolz, Juan C. Fernández, Rafael Mayo, Enrique S. Quintana-Ortí. High Performance Computing & Architectures (HPCA)

Manuel F. Dolz, Juan C. Fernández, Rafael Mayo, Enrique S. Quintana-Ortí. High Performance Computing & Architectures (HPCA) EnergySaving Cluster Roll: Power Saving System for Clusters Manuel F. Dolz, Juan C. Fernández, Rafael Mayo, Enrique S. Quintana-Ortí High Performance Computing & Architectures (HPCA) University Jaume I

More information

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian Wen, Zhichao Yan, Hong Jiang, Witawas Srisa-an, Baiping Wang, Hao Luo Outline Background and Motivation

More information

Lecture: Benchmarks, Pipelining Intro. Topics: Performance equations wrap-up, Intro to pipelining

Lecture: Benchmarks, Pipelining Intro. Topics: Performance equations wrap-up, Intro to pipelining Lecture: Benchmarks, Pipelining Intro Topics: Performance equations wrap-up, Intro to pipelining 1 Measuring Performance Two primary metrics: wall clock time (response time for a program) and throughput

More information

ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS

ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS Radhakrishnan R 1, Karthik

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Graph Partitioning for Scalable Distributed Graph Computations

Graph Partitioning for Scalable Distributed Graph Computations Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering

More information

Facilitating Magnetic Recording Technology Scaling for Data Center Hard Disk Drives through Filesystem-level Transparent Local Erasure Coding

Facilitating Magnetic Recording Technology Scaling for Data Center Hard Disk Drives through Filesystem-level Transparent Local Erasure Coding Facilitating Magnetic Recording Technology Scaling for Data Center Hard Disk Drives through Filesystem-level Transparent Local Erasure Coding Yin Li, Hao Wang, Xuebin Zhang, Ning Zheng, Shafa Dahandeh,

More information

QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation

QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation * Universität Karlsruhe (TH) Technical University of Catalonia (UPC) Barcelona Supercomputing Center (BSC) Samuel

More information

How to ask the right questions about backup reporting products Lindsay Morris Product Architect

How to ask the right questions about backup reporting products Lindsay Morris Product Architect When the SRM Honeymoon s over How to ask the right questions about backup reporting products Lindsay Morris Product Architect The software honeymoon cycle 1. We need some help! Let s see what s out there.

More information

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241

More information

White paper ETERNUS Extreme Cache Performance and Use

White paper ETERNUS Extreme Cache Performance and Use White paper ETERNUS Extreme Cache Performance and Use The Extreme Cache feature provides the ETERNUS DX500 S3 and DX600 S3 Storage Arrays with an effective flash based performance accelerator for regions

More information

Evolving To The Big Data Warehouse

Evolving To The Big Data Warehouse Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from

More information

Oracle Database 11g: Real Application Testing & Manageability Overview

Oracle Database 11g: Real Application Testing & Manageability Overview Oracle Database 11g: Real Application Testing & Manageability Overview Top 3 DBA Activities Performance Management Challenge: Sustain Optimal Performance Change Management Challenge: Preserve Order amid

More information

Analytics Research Internship at Hewlett Packard Labs

Analytics Research Internship at Hewlett Packard Labs Analytics Research Internship at Hewlett Packard Labs Stefanie Deo Mentor: Mehran Kafai September 12, 2016 First, another opportunity that came my way but didn t pan out: Data Science Internship at Intuit!

More information

Feedback Arc Set in Bipartite Tournaments is NP-Complete

Feedback Arc Set in Bipartite Tournaments is NP-Complete Feedback Arc Set in Bipartite Tournaments is NP-Complete Jiong Guo 1 Falk Hüffner 1 Hannes Moser 2 Institut für Informatik, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, D-07743 Jena, Germany

More information

QoS Management of Web Services

QoS Management of Web Services QoS Management of Web Services Zibin Zheng (Ben) Supervisor: Prof. Michael R. Lyu Department of Computer Science & Engineering The Chinese University of Hong Kong Dec. 10, 2010 Outline Introduction Web

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

Grid Scheduler. Grid Information Service. Local Resource Manager L l Resource Manager. Single CPU (Time Shared Allocation) (Space Shared Allocation)

Grid Scheduler. Grid Information Service. Local Resource Manager L l Resource Manager. Single CPU (Time Shared Allocation) (Space Shared Allocation) Scheduling on the Grid 1 2 Grid Scheduling Architecture User Application Grid Scheduler Grid Information Service Local Resource Manager Local Resource Manager Local L l Resource Manager 2100 2100 2100

More information

Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration

Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Berni Schiefer schiefer@ca.ibm.com Tuesday March 17th at 12:00

More information

Automated Database Workload Characterization, Mapping, and Tuning through Machine Learning. Abstract

Automated Database Workload Characterization, Mapping, and Tuning through Machine Learning. Abstract Automated Database Workload Characterization, Mapping, and Tuning through Machine Learning Madeline MacDonald University of Utah UUCS-18-007 School of Computing University of Utah Salt Lake City, UT 84112

More information

Software Development of Automatic Data Collector for Bus Route Planning System

Software Development of Automatic Data Collector for Bus Route Planning System International Journal of Electrical and Computer Engineering (IJECE) Vol. 5, No. 1, February 2015, pp. 150~157 ISSN: 2088-8708 150 Software Development of Automatic Data Collector for Bus Route Planning

More information

Interactive Branched Video Streaming and Cloud Assisted Content Delivery

Interactive Branched Video Streaming and Cloud Assisted Content Delivery Interactive Branched Video Streaming and Cloud Assisted Content Delivery Niklas Carlsson Linköping University, Sweden @ Sigmetrics TPC workshop, Feb. 2016 The work here was in collaboration... Including

More information

Review on Managing RDF Graph Using MapReduce

Review on Managing RDF Graph Using MapReduce Review on Managing RDF Graph Using MapReduce 1 Hetal K. Makavana, 2 Prof. Ashutosh A. Abhangi 1 M.E. Computer Engineering, 2 Assistant Professor Noble Group of Institutions Junagadh, India Abstract solution

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Customizing Progressive JPEG for Efficient Image Storage

Customizing Progressive JPEG for Efficient Image Storage Customizing Progressive JPEG for Efficient Image Storage Eddie Yan Kaiyuan Zhang Xi Wang Karin Strauss Luis Ceze HotStorage 17 July 11, 2017 2 2 2 2 2 2 2 2 2 Summary Today s image hosts need to store

More information

Profile-Guided Program Simplification for Effective Testing and Analysis

Profile-Guided Program Simplification for Effective Testing and Analysis Profile-Guided Program Simplification for Effective Testing and Analysis Lingxiao Jiang Zhendong Su Program Execution Profiles A profile is a set of information about an execution, either succeeded or

More information

VarexJ: A Variability-Aware Java Interpreter

VarexJ: A Variability-Aware Java Interpreter VarexJ: A Variability-Aware Java Interpreter Testing Configurable Systems Jens Meinicke, Chu-Pan Wong, Christian Kästner FOSD Meeting 2015 Feature Interaction Jens Meinicke VarexJ - Testing Configurable

More information

Improving Face Recognition by Exploring Local Features with Visual Attention

Improving Face Recognition by Exploring Local Features with Visual Attention Improving Face Recognition by Exploring Local Features with Visual Attention Yichun Shi and Anil K. Jain Michigan State University Difficulties of Face Recognition Large variations in unconstrained face

More information

Scheduling of Independent Tasks in Cloud Computing Using Modified Genetic Algorithm (FUZZY LOGIC)

Scheduling of Independent Tasks in Cloud Computing Using Modified Genetic Algorithm (FUZZY LOGIC) Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 9, September 2015,

More information

Tim Ebringer The university of Melbourne, Australia Li Sun RMIT university, Australia Serdar Boztas RMIT university, Australia VB 2008 Ottawa,

Tim Ebringer The university of Melbourne, Australia Li Sun RMIT university, Australia Serdar Boztas RMIT university, Australia VB 2008 Ottawa, A FAST RANDOMNESS TEST THAT PRESERVES LOCAL DETAIL Tim Ebringer The university of Melbourne, Australia Li Sun RMIT university, Australia Serdar Boztas RMIT university, Australia VB 2008 Ottawa, Canada,

More information

Testing Properties of Dataflow Program Operators

Testing Properties of Dataflow Program Operators Testing Properties of Dataflow Program Operators Zhihong Xu Martin Hirzel Gregg Rothermel Kun-Lung Wu U. of Nebraska IBM Research U. of Nebraska IBM Research Presentation at ASE, November 2013 Big Data

More information

Patterns that Matter

Patterns that Matter Patterns that Matter Describing Structure in Data Matthijs van Leeuwen Leiden Institute of Advanced Computer Science 17 November 2015 Big Data: A Game Changer in the retail sector Predicting trends Forecasting

More information

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science 310 Million + Current Domain Names 11 Billion+ Historical Domain Profiles 5 Million+ New Domain Profiles Daily

More information

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo Evolution of Big Data Architectures@ Facebook Architecture Summit, Shenzhen, August 2012 Ashish Thusoo About Me Currently Co-founder/CEO of Qubole Ran the Data Infrastructure Team at Facebook till 2011

More information

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS OVERVIEW When storage demands and budget constraints collide, discovery suffers. And it s a growing problem. Driven by ever-increasing performance and

More information

Contents. Part I Setting the Scene

Contents. Part I Setting the Scene Contents Part I Setting the Scene 1 Introduction... 3 1.1 About Mobility Data... 3 1.1.1 Global Positioning System (GPS)... 5 1.1.2 Format of GPS Data... 6 1.1.3 Examples of Trajectory Datasets... 8 1.2

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

A Comprehensive Analytical Performance Model of DRAM Caches

A Comprehensive Analytical Performance Model of DRAM Caches A Comprehensive Analytical Performance Model of DRAM Caches Authors: Nagendra Gulur *, Mahesh Mehendale *, and R Govindarajan + Presented by: Sreepathi Pai * Texas Instruments, + Indian Institute of Science

More information

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition

More information

FSE Clone- Based and Interactive Recommendation for Modifying Pasted Code

FSE Clone- Based and Interactive Recommendation for Modifying Pasted Code FSE 2015 Clone- Based and Interactive Recommendation for Modifying Pasted Code Yun Lin, Xin Peng, Zhenchang Xing, Diwen Zheng, and Wenyun Zhao Software School, Fudan University pengxin@fudan.edu.cn http://www.se.fudan.edu.cn/pengxin

More information

AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved

AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH 2019 SNOWFLAKE Our vision Allow our customers to access all their data in one place so they can make actionable decisions anytime, anywhere, with any number

More information

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge

More information

ITIL Capacity Management Deep Dive

ITIL Capacity Management Deep Dive ITIL Capacity Management Deep Dive Chris Molloy IBM Distinguished Engineer International Business Machines Agenda IBM Global Services ITIL Business Model ITIL Architecture ITIL Capacity Management Introduction

More information

Experimental Study of Virtual Machine Migration in Support of Reservation of Cluster Resources

Experimental Study of Virtual Machine Migration in Support of Reservation of Cluster Resources Experimental Study of Virtual Machine Migration in Support of Reservation of Cluster Resources Ming Zhao, Renato J. Figueiredo Advanced Computing and Information Systems (ACIS) Electrical and Computer

More information

Design of a Dynamic Data-Driven System for Multispectral Video Processing

Design of a Dynamic Data-Driven System for Multispectral Video Processing Design of a Dynamic Data-Driven System for Multispectral Video Processing Shuvra S. Bhattacharyya University of Maryland at College Park ssb@umd.edu With contributions from H. Li, K. Sudusinghe, Y. Liu,

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

Modification and Evaluation of Linux I/O Schedulers

Modification and Evaluation of Linux I/O Schedulers Modification and Evaluation of Linux I/O Schedulers 1 Asad Naweed, Joe Di Natale, and Sarah J Andrabi University of North Carolina at Chapel Hill Abstract In this paper we present three different Linux

More information

Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries

Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries Chris Calvert, CISSP, CISM Director of Solutions Innovation Copyright 2013 Hewlett-Packard Development

More information

A METRIC BASED EVALUATION OF TEST CASE PRIORITATION TECHNIQUES- HILL CLIMBING, REACTIVE GRASP AND TABUSEARCH

A METRIC BASED EVALUATION OF TEST CASE PRIORITATION TECHNIQUES- HILL CLIMBING, REACTIVE GRASP AND TABUSEARCH A METRIC BASED EVALUATION OF TEST CASE PRIORITATION TECHNIQUES- HILL CLIMBING, REACTIVE GRASP AND TABUSEARCH 1 M.Manjunath, 2 N.Backiavathi 1 PG Scholar, Department of Information Technology,Jayam College

More information

Machine Learning for Software Engineering

Machine Learning for Software Engineering Machine Learning for Software Engineering Single-State Meta-Heuristics Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Recap: Goal is to Find the Optimum Challenges of general optimization

More information

A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES

A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES Zui Zhang, Kun Liu, William Wang, Tai Zhang and Jie Lu Decision Systems & e-service Intelligence Lab, Centre for Quantum Computation

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Fujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities

Fujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities Fujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities Satoshi Tsuchiya Cloud Computing Research Center Fujitsu Laboratories Ltd. January, 2012 Overview: Fujitsu s Cloud and

More information

Netezza The Analytics Appliance

Netezza The Analytics Appliance Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for

More information

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been

More information

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage

More information

NoVA MySQL October Meetup. Tim Callaghan VP/Engineering, Tokutek

NoVA MySQL October Meetup. Tim Callaghan VP/Engineering, Tokutek NoVA MySQL October Meetup TokuDB and Fractal Tree Indexes Tim Callaghan VP/Engineering, Tokutek 2012.10.23 1 About me, :) Mark Callaghan s lesser-known but nonetheless smart brother. [C. Monash, May 2010]

More information

SHARDS & Talus: Online MRC estimation and optimization for very large caches

SHARDS & Talus: Online MRC estimation and optimization for very large caches SHARDS & Talus: Online MRC estimation and optimization for very large caches Nohhyun Park CloudPhysics, Inc. Introduction Efficient MRC Construction with SHARDS FAST 15 Waldspurger at al. Talus: A simple

More information

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF Conference 2017 The Data Challenges of the LHC Reda Tafirout, TRIUMF Outline LHC Science goals, tools and data Worldwide LHC Computing Grid Collaboration & Scale Key challenges Networking ATLAS experiment

More information