DCBench: a Data Center Benchmark Suite
|
|
- Jack Cobb
- 5 years ago
- Views:
Transcription
1 DCBench: a Data Center Benchmark Suite Zhen Jia ( 贾禛 ) Institute of Computing Technology, Chinese Academy of Sciences workshop in conjunction with CCF October 31,2013,Guilin INSTITUTE OF COMPUTING TECHNOLOGY
2 Workload Spectrum CPU intensive Figure from Intel Memory intensive I/O intensive
3 Workload Spectrum Data Centers
4 Why Benchmarking? Sometimes there is a solution.
5 Why Benchmarking? What about the solution when
6 Benchmark s Role in Computer Science Benchmarking is the quantitative foundation of computer system and architecture research, are used to experimentally determine the benefits of new designs. C. Bienia, S. Kumar, J. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. PACT 2008
7 State of Practice Benchmark Suites SPEC CPU SPEC Web HPCC PARSEC TPCC Gridmix YCSB
8 Distinguishing features: Massive scale Mixed workloads Workload classification: Online services (service) E.g. Web search Data Centers [1] Offline processing (data analysis) E.g. MapReduce programs [1] Barroso et al, The Datacenter as a Computer, 2009
9 Previous Work CloudSuite (Ferdman et al., Clearing the Clouds, ASPLOS 2012) Six scale out workloads: Web search Web serving Service workloads Media streaming Data serving Data analytic(bayes) Data Analysis Workload Software testing They incline to service workloads!
10 Scale out Performance of Data Analysis Workloads Speed Up CloudSuite data analytic Bayes Data analysis workloads are diversified!
11 Content Background and Motivation DCBench Workloads Characterization
12 DCBench DCBench Scale out Service Data Analysis VM Operation Release on July 2013 Workloads Web site:
13 Methodology of Workloads Choosing Step 1: Rank main websites and web services according to page views and daily visitors Step 2: Decompose the service programs into algorithms and basic operations Step 3: Select algorithms and basic operations according to their popularity
14 Step 1: Ranking
15 Top Sites on the Web Search Engine Electronic Commerce Others Social Network Media Streaming 5% 15% 15% 40% 25% Top Sites on the Web More details in
16 Step 2: Decomposing
17 Algorithms in Search Engine graph mining grep & segmentation word count pagerank sort Figure from The Anatomy of a Large-Scale Hypertextual Web Search Engine vector calculation
18 Algorithms in Recommendation Subsystems
19 Summary of Anatomy of Common Services Search Engine Electronic Commerce Others 5% 15% 15% 25% Social Network Media Streaming 40% Algorithms used in Search: Algorithms Pagerank used in Social Network: Algorithms used in electronic Recommendation Segmentation commerce: Clustering Feature Reduction Recommendation Classification Grep Associate rule mining Grep Statistical counting Warehouse operation Feature sort Reduction Clustering Statistical Recommendation counting Classification Sort Statistical counting Top Sites on The Web
20 Step 3: Selecting
21 Top Operations and Algorithms Search Engine Electronic Commerce Others Social Network Media Streaming Grep Pagerank 5% 15% 40% 15% 25% Recommendation Top Sites on The Web
22 Main Algorithms in Data Centers Basic operation Segmentation Classification Warehouse operation Cluster Data center algorithms Feature reduction Recommendation Association rule mining Vector calculate Graph mining
23 Overview of DCBench Category Workloads Programming language source model Basic operation Sort MapReduce Java Hadoop Wordcount MapReduce Java Hadoop Grep MapReduce Java Hadoop Classification Naïve Bayes MapReduce Java Mahout Support Vector Machine MapReduce Java Implemented by ourself Cluster K means MapReduce Java Mahout MPI C++ IBM PML Fuzzy k means MapReduce Java Mahout MPI C++ IBM PML Recommendation Item based MapReduce Java Mahout Collaborative Filtering Association rule Frequent pattern MapReduce Java Mahout mining growth Segmentation Hidden Markov model MapReduce Java Implemented by ourself
24 Category Workloads Programming language source model Warehouse Database operations MapReduce Java Hive bench operation Feature reduction Overview of DCBench (Cont ) Principal Component Analysis Kernel Principal Component Analysis MPI C++ IBM PML MPI C++ IBM PML Vector calculate Paper similarity All Pairs C&C++ Implemented by ourself analysis Graph mining Breadth first search MPI C++ Graph500 Pagerank MapReduce Java Mahout Service Search engine C/S Java Implemented by ourself Auction C/S Java Rubis Service Media streaming C/S Java Cloudsuite
25 Content Background and Motivation DCBench Workloads Characterization [2] [2] Zhen Jia et al, Characterizing Data Analysis Workloads in Data Centers IISWC 2013 Best Paper
26 Compared Benchmarks Filed : Scale out workloads HPC CPU Web CloudSuite v1 HPCC SPEC CPU 2006 SPEC Web 2005 Web search HPL SPEC INT TPC W Workloads : Data serving Streaming SPEC FP Web serving Ptrans PARSEC Media streaming RandomAccess Software testing DGEMM FFT Comm Scale-out service workloads share many similarity characteristics with that of traditional service workloads. So we just use the service workloads to describe them
27 Breakdown of Executed Instructions Data analysis 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% kernel application service Naive Bayes SVM Grep WordCount K means Fuzzy K means PageRank Sort Hive bench IBCF HMM avg Software Testing Media Streaming Data Serving Web Search Web Serving SPECWeb TPC W SPECFP SPECINT PARSEC HPCC DGEMM HPCC FFT HPCC HPL HPCC PTRANS HPCC RandomAccess HPCC STREAM Analysis workloads have more application level instructions The service workloads have higher percentages of kernel level instructions
28 Architecture Block Diagram Figure from Intel
29 Data analysis Pipeline Stalls The service workloads have more RAT (Register Allocation Table) stalls The data analysis workloads have more RS (Reservation Station) and ROB (ReOrder Buffer) full stalls Front end stalls! Service
30 Main reason of pipeline stall: memory wall Figure from :The Architecture of the Nehalem Processor And Nehalem-EP SMP Platforms
31 Reasons of Front End Stalls High Icache misses and ITLB misses cause front end stall L1 ICache Miss per K Instruction100 Data analysis service
32 L2 Cache Behaviors Data analysis workloads have good L2 cache behaviors service L2 Cache misses per k Instruction Data analysis
33 Data Center workloads Percentage of L2 misses satisfied by L3 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Have good LLC behaviors LLC behaviors Better than most of the HPC workloads
34 Branch Prediction Data analysis workloads have pretty good branch behaviors Branches of Services workloads are hard to predict Branch misprediction ratio 12.00% 10.00% 8.00% 6.00% 4.00% 2.00% Data analysis service 0.00% 34
35 Some Observations Analysis workloads are different from scale out service workloads and traditional workloads For data analysis workloads, more app level instructions are executed High Icache and ITLB misses Impact: High percentage of front end stall Cause: Massive scale of software infrastructure, high level languages, third party lib Rethink the design of Icache or ITLB or simplify SW stack Low level caches are good for data analysis workloads Pay more attention to area and energy of caches The branch predictor is quite effective
36 More information:
37 Back up
38 Data Center v.s. Big Data Scale out Service VM Operation Big Data Analytic Data Intensive HPC Data center Big Data
39 Each Algorithm s Application Scenarios Algorithm Sort Wordcount Grep Naïve Bayes Support Vector Machine Application Scenarios Ranking the pages according to its importance (PageRank) Pages sorting by its ID (Web storage in database) Calculating the TF IDF base information,such as term frequency Obtain the user operations count to analysis their social behavior (in Wolfram Alpha) Log analysis Web information extraction Fuzzy search Spam recognition(spam Filtering with Naive Bayes) Bioinformatics(Naïve Bayesian Classifier for Rapid Assignment of RNA Sequences into the New Bacterial Taxonomy) Classification ( Question Classification) Image Processing (Image annotation) Text Categorization
40 Each Algorithm s Application Scenarios (Cont ) K means Item based Collaborative Filtering Hidden Markov model Frequent pattern growth Warehouse operation Principal Component Analysis Image processing (Fast image segmentation) High resolution landform classification Amazon recommender system Bioinformatics (Protein homology detection) Speech recognition, Handwriting recognition Word Segmentation Market Analysis Data mining in Business (identifying competitive suppliers in Supply Chain Management) Intrusion detection Query Recommendation Taobao Yunti system Facebook Yahoo! computer vision pattern recognition Face Representation and Recognition
Characterizing Data Analysis Workloads in Data Centers
Characterizing Data Analysis Workloads in Data Centers Zhen Jia 1,2, Lei Wang 1,2, Jianfeng Zhan 1*, Lixin Zhang 1, and Chunjie Luo 1 1 State Key Laboratory Computer Architecture, Institute of Computing
More informationBigDataBench: a Benchmark Suite for Big Data Application
BigDataBench: a Benchmark Suite for Big Data Application Wanling Gao Institute of Computing Technology, Chinese Academy of Sciences HVC tutorial in conjunction with The 19th IEEE International Symposium
More informationCloudRank-D: benchmarking and ranking cloud computing systems for data processing applications
Front. Comput. Sci., 2012, 6(4): 347 362 DOI 10.1007/s11704-012-2118-7 CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications Chunjie LUO 1,JianfengZHAN 1,ZhenJIA
More informationDCBench. A Benchmark suite for Data Center Workloads
DCBench A Benchmark suite for Data Center Workloads May 27th 2013 Revision Sheet Release No. Date Revision Description Rev. 1.0 27/05/2013 DCBench v1.0 Revision Sheet... 2 1. Motivation... 4 2. Methodology...
More informationBigDataBench: a Big Data Benchmark Suite from Web Search Engines
BigDataBench: a Big Data Benchmark Suite from Web Search Engines Wanling Gao, Yuqing Zhu, Zhen Jia, Chunjie Luo, Lei Wang, Jianfeng Zhan, Yongqiang He, Shiming Gong, Xiaona Li, Shujie Zhang, and Bizhu
More informationHigh Volume Throughput Computers (HVC): An ICT View of Datacenter Computers
High Volume Throughput Computers (HVC): An ICT View of Datacenter Computers Jianfeng Zhan ( 詹剑锋 ) http://prof.ict.ac.cn/jfzhan http://weibo.com/jfzhan Outline Motivation Related work Challenges and Opportunities
More informationHow Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server Ahsan Javed Awan EMJD-DC (KTH-UPC) (https://www.kth.se/profile/ajawan/) Mats Brorsson(KTH), Vladimir Vlassov(KTH) and Eduard Ayguade(UPC
More informationUnderstanding the Role of Memory Subsystem on Performance and Energy-Efficiency of Hadoop Applications
This paper appears in the 27 IGSC Invited Papers on Emerging Topics in Sustainable Memories Understanding the Role of Memory Subsystem on Performance and Energy-Efficiency of Hadoop Applications Hosein
More informationBigDataBench-MT: Multi-tenancy version of BigDataBench
BigDataBench-MT: Multi-tenancy version of BigDataBench Gang Lu Beijing Academy of Frontier Science and Technology BigDataBench Tutorial, ASPLOS 2016 Atlanta, GA, USA n Software perspective Multi-tenancy
More informationService Oriented Performance Analysis
Service Oriented Performance Analysis Da Qi Ren and Masood Mortazavi US R&D Center Santa Clara, CA, USA www.huawei.com Performance Model for Service in Data Center and Cloud 1. Service Oriented (end to
More informationPARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites
PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian Bienia (Princeton University), Sanjeev Kumar (Intel), Kai Li (Princeton University) Outline Overview What
More informationCharacterizing OS Behaviors of Datacenter and Big Data Workloads
216 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems
More informationibench: Quantifying Interference in Datacenter Applications
ibench: Quantifying Interference in Datacenter Applications Christina Delimitrou and Christos Kozyrakis Stanford University IISWC September 23 th 2013 Executive Summary Problem: Increasing utilization
More informationBIGDATABENCH: A DWARF-BASED BIG DATA AND AI BENCHMARK SUITE
BIGDATABENCH: A DWARF-BASED BIG DATA AND AI BENCHMARK SUITE EDITED BY WANLING GAO JIANFENG ZHAN LEI WANG CHUNJIE LUO DAOYI ZHENG RUI REN CHEN ZHENG GANG LU JINGWEI LI ZHENG CAO SHUJIE ZHANG HAONING TANG
More informationMulti-tenancy version of BigDataBench
Multi-tenancy version of BigDataBench Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Multi-tenancy
More informationBigDataBench Technical Report
BigBench Technical Report ICT, Chinese Academy of Sciences Contacts (Email) Prof. Jianfeng Zhan: zhanjianfeng@ict.ac.cn 1 Introduction As a multi-discipline research and engineering effort, i.e., system,
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationOn Big Data Benchmarking
On Big Data Benchmarking 1 Rui Han and 2 Xiaoyi Lu 1 Department of Computing, Imperial College London 2 Ohio State University r.han10@imperial.ac.uk, luxi@cse.ohio-state.edu Abstract Big data systems address
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationChapter 3 Instruction-Level Parallelism and its Exploitation (Part 5)
Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 5) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3, 3.9, and Appendix C) Hardware
More informationAn Architectural Characterization Study of Data Mining and Bioinformatics Workloads
An Architectural Characterization Study of Data Mining and Bioinformatics Workloads Berkin Ozisikyilmaz Ramanathan Narayanan Gokhan Memik Alok ChoudharyC Department of Electrical Engineering and Computer
More informationHow to use BigDataBench workloads and data sets
How to use BigDataBench workloads and data sets Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1
More informationChapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1
Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed
More informationHandbook of BigDataBench (Version 3.1) A Big Data Benchmark Suite
Handbook of BigDataBench (Version 3.1) A Big Data Benchmark Suite Chunjie Luo 1, Wanling Gao 1, Zhen Jia 1, Rui Han 1, Jingwei Li 1, Xinlong Lin 1, Lei Wang 1, Yuqing Zhu 1, and Jianfeng Zhan 1 1 Institute
More informationSRM-Buffer: An OS Buffer Management Technique to Prevent Last Level Cache from Thrashing in Multicores
SRM-Buffer: An OS Buffer Management Technique to Prevent Last Level Cache from Thrashing in Multicores Xiaoning Ding et al. EuroSys 09 Presented by Kaige Yan 1 Introduction Background SRM buffer design
More informationWorkload Characterization and Optimization of TPC-H Queries on Apache Spark
Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research
More informationData Centric Computing
Piyush Chaudhary HPC Solutions Development Data Centric Computing SPXXL/SCICOMP Summer 2011 Agenda What is Data Centric Computing? What is Driving Data Centric Computing? Puzzle vs.
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationG(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu
G(B)enchmark GraphBench: Towards a Universal Graph Benchmark Khaled Ammar M. Tamer Özsu Bioinformatics Software Engineering Social Network Gene Co-expression Protein Structure Program Flow Big Graphs o
More informationProfileMe: Hardware-Support for Instruction-Level Profiling on Out-of-Order Processors
ProfileMe: Hardware-Support for Instruction-Level Profiling on Out-of-Order Processors Jeffrey Dean Jamey Hicks Carl Waldspurger William Weihl George Chrysos Digital Equipment Corporation 1 Motivation
More informationCh. 7: Benchmarks and Performance Tests
Ch. 7: Benchmarks and Performance Tests Kenneth Mitchell School of Computing & Engineering, University of Missouri-Kansas City, Kansas City, MO 64110 Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 1/3 Introduction
More informationMain-Memory Requirements of Big Data Applications on Commodity Server Platform
218 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Main-Memory Requirements of Big Data Applications on Commodity Server Platform Hosein Mohammadi Makrani, Setareh Rafatirad,
More informationEfficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero
Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero The Nineteenth International Conference on Parallel Architectures and Compilation Techniques (PACT) 11-15
More informationCharacterizing and Subsetting Big Data Workloads
Characterizing and Subsetting Big Data Workloads Zhen Jia 1,, Jianfeng Zhan 1*, Lei Wang 1, Rui Han 1, Sally A. McKee 3, Qiang Yang 1, Chunjie Luo 1, and Jingwei Li 1 1 State Key Laboratory Computer Architecture,
More informationShort Survey on Static Hand Gesture Recognition
Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of
More informationMachine Learning in Action
Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory
More informationHPC over Cloud. July 16 th, SCENT HPC Summer GIST. SCENT (Super Computing CENTer) GIST (Gwangju Institute of Science & Technology)
HPC over Cloud July 16 th, 2014 2014 HPC Summer School @ GIST (Super Computing CENTer) GIST (Gwangju Institute of Science & Technology) Dr. JongWon Kim jongwon@nm.gist.ac.kr Interplay between Theory, Simulation,
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationWalking Four Machines by the Shore
Walking Four Machines by the Shore Anastassia Ailamaki www.cs.cmu.edu/~natassa with Mark Hill and David DeWitt University of Wisconsin - Madison Workloads on Modern Platforms Cycles per instruction 3.0
More informationComposite Metrics for System Throughput in HPC
Composite Metrics for System Throughput in HPC John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2003 Phoenix, AZ November 18, 2003 Overview The HPC Challenge Benchmark was announced last
More informationResource and Performance Distribution Prediction for Large Scale Analytics Queries
Resource and Performance Distribution Prediction for Large Scale Analytics Queries Prof. Rajiv Ranjan, SMIEEE School of Computing Science, Newcastle University, UK Visiting Scientist, Data61, CSIRO, Australia
More informationTowards Energy-Proportional Datacenter Memory with Mobile DRAM
Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University
More informationINTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING
CS 7265 BIG DATA ANALYTICS INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, PhD Computer Science,
More informationV Conclusions. V.1 Related work
V Conclusions V.1 Related work Even though MapReduce appears to be constructed specifically for performing group-by aggregations, there are also many interesting research work being done on studying critical
More informationLinux Performance on IBM System z Enterprise
Linux Performance on IBM System z Enterprise Christian Ehrhardt IBM Research and Development Germany 11 th August 2011 Session 10016 Agenda zenterprise 196 design Linux performance comparison z196 and
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationProfiling: Understand Your Application
Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More informationComputer Comparisons Using HPCC. Nathan Wichmann Benchmark Engineer
Computer Comparisons Using HPCC Nathan Wichmann Benchmark Engineer Outline Comparisons using HPCC HPCC test used Methods used to compare machines using HPCC Normalize scores Weighted averages Comparing
More informationExploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating
More informationMbench: Benchmarking a Multicore Operating System Using Mixed Workloads
Mbench: Benchmarking a Multicore Operating System Using Mixed Workloads Gang Lu and Xinlong Lin Institute of Computing Technology, Chinese Academy of Sciences BPOE-6, Sep 4, 2015 Backgrounds Fast evolution
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationNetezza The Analytics Appliance
Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for
More informationExploring the features of OpenCL 2.0
Exploring the features of OpenCL 2.0 Saoni Mukherjee, Xiang Gong, Leiming Yu, Carter McCardwell, Yash Ukidave, Tuan Dao, Fanny Paravecino, David Kaeli Northeastern University Outline Introduction and evolution
More informationWarehouse- Scale Computing and the BDAS Stack
Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,
More informationChapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction
CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationAn Empirical Model for Predicting Cross-Core Performance Interference on Multicore Processors
An Empirical Model for Predicting Cross-Core Performance Interference on Multicore Processors Jiacheng Zhao Institute of Computing Technology, CAS In Conjunction with Prof. Jingling Xue, UNSW, Australia
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationNetwork Requirements for Resource Disaggregation
Network Requirements for Resource Disaggregation Peter Gao (Berkeley), Akshay Narayan (MIT), Sagar Karandikar (Berkeley), Joao Carreira (Berkeley), Sangjin Han (Berkeley), Rachit Agarwal (Cornell), Sylvia
More informationCATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING
CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline
More informationChapter 10. Conclusion Discussion
Chapter 10 Conclusion 10.1 Discussion Question 1: Usually a dynamic system has delays and feedback. Can OMEGA handle systems with infinite delays, and with elastic delays? OMEGA handles those systems with
More informationMachine Learning in WAN Research
Machine Learning in WAN Research Mariam Kiran mkiran@es.net Energy Sciences Network (ESnet) Lawrence Berkeley National Lab Oct 2017 Presented at Internet2 TechEx 2017 Outline ML in general ML in network
More informationHow to use the BigDataBench simulator versions
How to use the BigDataBench simulator versions Zhen Jia Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY Objec8ves
More informationSystems Ph.D. Qualifying Exam
Systems Ph.D. Qualifying Exam Spring 2011 (March 22, 2011) NOTE: PLEASE ATTEMPT 6 OUT OF THE 8 QUESTIONS GIVEN BELOW. Question 1 (Multicore) There are now multiple outstanding proposals and prototype systems
More informationPowerPC 620 Case Study
Chapter 6: The PowerPC 60 Modern Processor Design: Fundamentals of Superscalar Processors PowerPC 60 Case Study First-generation out-of-order processor Developed as part of Apple-IBM-Motorola alliance
More informationInstruction Level Parallelism
Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic
More informationFaculté Polytechnique
Faculté Polytechnique INFORMATIQUE PARALLÈLE ET DISTRIBUÉE CHAPTER 7 : CLOUD COMPUTING Sidi Ahmed Mahmoudi sidi.mahmoudi@umons.ac.be 13 December 2017 PLAN Introduction I. History of Cloud Computing and
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationAn Introduction to Apache Spark
An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations
More informationStatistical Simulation of Superscalar Architectures using Commercial Workloads
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information Systems (ELIS) Ghent University, Belgium CAECW
More informationBig Data Analytics for Host Misbehavior Detection
Big Data Analytics for Host Misbehavior Detection Miguel Pupo Correia joint work with Daniel Gonçalves, João Bota (Vodafone PT) 2016 European Security Conference June 2016 Motivation Networks are complex,
More informationSparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics
SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics Min LI,, Jian Tan, Yandong Wang, Li Zhang, Valentina Salapura, Alan Bivens IBM TJ Watson Research Center * A
More informationPerformance Benefits of DataMPI: A Case Study with BigDataBench
Benefits of DataMPI: A Case Study with BigDataBench Fan Liang 1,2 Chen Feng 1,2 Xiaoyi Lu 3 Zhiwei Xu 1 1 Institute of Computing Technology, Chinese Academy of Sciences 2 University of Chinese Academy
More informationNavigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets
Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Nadathur Satish, Narayanan Sundaram, Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming
More informationImproving the MapReduce Big Data Processing Framework
Improving the MapReduce Big Data Processing Framework Gistau, Reza Akbarinia, Patrick Valduriez INRIA & LIRMM, Montpellier, France In collaboration with Divyakant Agrawal, UCSB Esther Pacitti, UM2, LIRMM
More informationARMv8 Micro-architectural Design Space Exploration for High Performance Computing using Fractional Factorial
ARMv8 Micro-architectural Design Space Exploration for High Performance Computing using Fractional Factorial Roxana Rusitoru Systems Research Engineer, ARM 1 Motivation & background Goal: Why: Who: 2 HPC-oriented
More informationMixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage
More informationTowards Fair and Efficient SMP Virtual Machine Scheduling
Towards Fair and Efficient SMP Virtual Machine Scheduling Jia Rao and Xiaobo Zhou University of Colorado, Colorado Springs http://cs.uccs.edu/~jrao/ Executive Summary Problem: unfairness and inefficiency
More informationHPCC Results. Nathan Wichmann Benchmark Engineer
HPCC Results Nathan Wichmann Benchmark Engineer Outline What is HPCC? Results Comparing current machines Conclusions May 04 2 HPCChallenge Project Goals To examine the performance of HPC architectures
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationContents PART I: CLOUD, BIG DATA, AND COGNITIVE COMPUTING 1
Preface xiii PART I: CLOUD, BIG DATA, AND COGNITIVE COMPUTING 1 1 Princi ples of Cloud Computing Systems 3 1.1 Elastic Cloud Systems for Scalable Computing 3 1.1.1 Enabling Technologies for Cloud Computing
More informationResearch Works to Cope with Big Data Volume and Variety. Jiaheng Lu University of Helsinki, Finland
Research Works to Cope with Big Data Volume and Variety Jiaheng Lu University of Helsinki, Finland Big Data: 4Vs Photo downloaded from: https://blog.infodiagram.com/2014/04/visualizing-big-data-concepts-strong.html
More informationEfficient On-Demand Operations in Distributed Infrastructures
Efficient On-Demand Operations in Distributed Infrastructures Steve Ko and Indranil Gupta Distributed Protocols Research Group University of Illinois at Urbana-Champaign 2 One-Line Summary We need to design
More informationTPCX-BB (BigBench) Big Data Analytics Benchmark
TPCX-BB (BigBench) Big Data Analytics Benchmark Bhaskar D Gowda Senior Staff Engineer Analytics & AI Solutions Group Intel Corporation bhaskar.gowda@intel.com 1 Agenda Big Data Analytics & Benchmarks Industry
More informationLarge-Scale GPU programming
Large-Scale GPU programming Tim Kaldewey Research Staff Member Database Technologies IBM Almaden Research Center tkaldew@us.ibm.com Assistant Adjunct Professor Computer and Information Science Dept. University
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationDistributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju
Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale
More informationSecurity-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat
Security-Aware Processor Architecture Design CS 6501 Fall 2018 Ashish Venkat Agenda Common Processor Performance Metrics Identifying and Analyzing Bottlenecks Benchmarking and Workload Selection Performance
More informationTwitter data Analytics using Distributed Computing
Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE
More informationPerformance Analysis of KDD Applications using Hardware Event Counters. CAP Theme 2.
Performance Analysis of KDD Applications using Hardware Event Counters CAP Theme 2 http://cap.anu.edu.au/cap/projects/kddmemperf/ Peter Christen and Adam Czezowski Peter.Christen@anu.edu.au Adam.Czezowski@anu.edu.au
More informationAn Introduction to Search Engines and Web Navigation
An Introduction to Search Engines and Web Navigation MARK LEVENE ADDISON-WESLEY Ал imprint of Pearson Education Harlow, England London New York Boston San Francisco Toronto Sydney Tokyo Singapore Hong
More informationEXTRACT DATA IN LARGE DATABASE WITH HADOOP
International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0
More informationAccelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads
WHITE PAPER Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads December 2014 Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents
More information