DCBench: a Data Center Benchmark Suite

Size: px

Start display at page:

Download "DCBench: a Data Center Benchmark Suite"

Jack Cobb
5 years ago
Views:

DCBench: a Data Center Benchmark Suite Zhen Jia ( 贾禛 ) http://prof.ict.ac.

1 DCBench: a Data Center Benchmark Suite Zhen Jia ( 贾禛 ) Institute of Computing Technology, Chinese Academy of Sciences workshop in conjunction with CCF October 31,2013,Guilin INSTITUTE OF COMPUTING TECHNOLOGY

2 Workload Spectrum CPU intensive Figure from Intel Memory intensive I/O intensive

3 Workload Spectrum Data Centers

4 Why Benchmarking? Sometimes there is a solution.

5 Why Benchmarking? What about the solution when

6 Benchmark s Role in Computer Science Benchmarking is the quantitative foundation of computer system and architecture research, are used to experimentally determine the benefits of new designs. C. Bienia, S. Kumar, J. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. PACT 2008

7 State of Practice Benchmark Suites SPEC CPU SPEC Web HPCC PARSEC TPCC Gridmix YCSB

8 Distinguishing features: Massive scale Mixed workloads Workload classification: Online services (service) E.g. Web search Data Centers [1] Offline processing (data analysis) E.g. MapReduce programs [1] Barroso et al, The Datacenter as a Computer, 2009

9 Previous Work CloudSuite (Ferdman et al., Clearing the Clouds, ASPLOS 2012) Six scale out workloads: Web search Web serving Service workloads Media streaming Data serving Data analytic(bayes) Data Analysis Workload Software testing They incline to service workloads!

10 Scale out Performance of Data Analysis Workloads Speed Up CloudSuite data analytic Bayes Data analysis workloads are diversified!

11 Content Background and Motivation DCBench Workloads Characterization

12 DCBench DCBench Scale out Service Data Analysis VM Operation Release on July 2013 Workloads Web site:

13 Methodology of Workloads Choosing Step 1: Rank main websites and web services according to page views and daily visitors Step 2: Decompose the service programs into algorithms and basic operations Step 3: Select algorithms and basic operations according to their popularity

14 Step 1: Ranking

15 Top Sites on the Web Search Engine Electronic Commerce Others Social Network Media Streaming 5% 15% 15% 40% 25% Top Sites on the Web More details in

16 Step 2: Decomposing

17 Algorithms in Search Engine graph mining grep & segmentation word count pagerank sort Figure from The Anatomy of a Large-Scale Hypertextual Web Search Engine vector calculation

18 Algorithms in Recommendation Subsystems

19 Summary of Anatomy of Common Services Search Engine Electronic Commerce Others 5% 15% 15% 25% Social Network Media Streaming 40% Algorithms used in Search: Algorithms Pagerank used in Social Network: Algorithms used in electronic Recommendation Segmentation commerce: Clustering Feature Reduction Recommendation Classification Grep Associate rule mining Grep Statistical counting Warehouse operation Feature sort Reduction Clustering Statistical Recommendation counting Classification Sort Statistical counting Top Sites on The Web

20 Step 3: Selecting

21 Top Operations and Algorithms Search Engine Electronic Commerce Others Social Network Media Streaming Grep Pagerank 5% 15% 40% 15% 25% Recommendation Top Sites on The Web

22 Main Algorithms in Data Centers Basic operation Segmentation Classification Warehouse operation Cluster Data center algorithms Feature reduction Recommendation Association rule mining Vector calculate Graph mining

23 Overview of DCBench Category Workloads Programming language source model Basic operation Sort MapReduce Java Hadoop Wordcount MapReduce Java Hadoop Grep MapReduce Java Hadoop Classification Naïve Bayes MapReduce Java Mahout Support Vector Machine MapReduce Java Implemented by ourself Cluster K means MapReduce Java Mahout MPI C++ IBM PML Fuzzy k means MapReduce Java Mahout MPI C++ IBM PML Recommendation Item based MapReduce Java Mahout Collaborative Filtering Association rule Frequent pattern MapReduce Java Mahout mining growth Segmentation Hidden Markov model MapReduce Java Implemented by ourself

24 Category Workloads Programming language source model Warehouse Database operations MapReduce Java Hive bench operation Feature reduction Overview of DCBench (Cont ) Principal Component Analysis Kernel Principal Component Analysis MPI C++ IBM PML MPI C++ IBM PML Vector calculate Paper similarity All Pairs C&C++ Implemented by ourself analysis Graph mining Breadth first search MPI C++ Graph500 Pagerank MapReduce Java Mahout Service Search engine C/S Java Implemented by ourself Auction C/S Java Rubis Service Media streaming C/S Java Cloudsuite

25 Content Background and Motivation DCBench Workloads Characterization [2] [2] Zhen Jia et al, Characterizing Data Analysis Workloads in Data Centers IISWC 2013 Best Paper

26 Compared Benchmarks Filed : Scale out workloads HPC CPU Web CloudSuite v1 HPCC SPEC CPU 2006 SPEC Web 2005 Web search HPL SPEC INT TPC W Workloads : Data serving Streaming SPEC FP Web serving Ptrans PARSEC Media streaming RandomAccess Software testing DGEMM FFT Comm Scale-out service workloads share many similarity characteristics with that of traditional service workloads. So we just use the service workloads to describe them

27 Breakdown of Executed Instructions Data analysis 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% kernel application service Naive Bayes SVM Grep WordCount K means Fuzzy K means PageRank Sort Hive bench IBCF HMM avg Software Testing Media Streaming Data Serving Web Search Web Serving SPECWeb TPC W SPECFP SPECINT PARSEC HPCC DGEMM HPCC FFT HPCC HPL HPCC PTRANS HPCC RandomAccess HPCC STREAM Analysis workloads have more application level instructions The service workloads have higher percentages of kernel level instructions

28 Architecture Block Diagram Figure from Intel

29 Data analysis Pipeline Stalls The service workloads have more RAT (Register Allocation Table) stalls The data analysis workloads have more RS (Reservation Station) and ROB (ReOrder Buffer) full stalls Front end stalls! Service

30 Main reason of pipeline stall: memory wall Figure from :The Architecture of the Nehalem Processor And Nehalem-EP SMP Platforms

31 Reasons of Front End Stalls High Icache misses and ITLB misses cause front end stall L1 ICache Miss per K Instruction100 Data analysis service

32 L2 Cache Behaviors Data analysis workloads have good L2 cache behaviors service L2 Cache misses per k Instruction Data analysis

33 Data Center workloads Percentage of L2 misses satisfied by L3 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Have good LLC behaviors LLC behaviors Better than most of the HPC workloads

34 Branch Prediction Data analysis workloads have pretty good branch behaviors Branches of Services workloads are hard to predict Branch misprediction ratio 12.00% 10.00% 8.00% 6.00% 4.00% 2.00% Data analysis service 0.00% 34

35 Some Observations Analysis workloads are different from scale out service workloads and traditional workloads For data analysis workloads, more app level instructions are executed High Icache and ITLB misses Impact: High percentage of front end stall Cause: Massive scale of software infrastructure, high level languages, third party lib Rethink the design of Icache or ITLB or simplify SW stack Low level caches are good for data analysis workloads Pay more attention to area and energy of caches The branch predictor is quite effective

36 More information:

37 Back up

38 Data Center v.s. Big Data Scale out Service VM Operation Big Data Analytic Data Intensive HPC Data center Big Data

39 Each Algorithm s Application Scenarios Algorithm Sort Wordcount Grep Naïve Bayes Support Vector Machine Application Scenarios Ranking the pages according to its importance (PageRank) Pages sorting by its ID (Web storage in database) Calculating the TF IDF base information,such as term frequency Obtain the user operations count to analysis their social behavior (in Wolfram Alpha) Log analysis Web information extraction Fuzzy search Spam recognition(spam Filtering with Naive Bayes) Bioinformatics(Naïve Bayesian Classifier for Rapid Assignment of RNA Sequences into the New Bacterial Taxonomy) Classification ( Question Classification) Image Processing (Image annotation) Text Categorization

40 Each Algorithm s Application Scenarios (Cont ) K means Item based Collaborative Filtering Hidden Markov model Frequent pattern growth Warehouse operation Principal Component Analysis Image processing (Fast image segmentation) High resolution landform classification Amazon recommender system Bioinformatics (Protein homology detection) Speech recognition, Handwriting recognition Word Segmentation Market Analysis Data mining in Business (identifying competitive suppliers in Supply Chain Management) Intrusion detection Query Recommendation Taobao Yunti system Facebook Yahoo! computer vision pattern recognition Face Representation and Recognition

Characterizing Data Analysis Workloads in Data Centers

Characterizing Data Analysis Workloads in Data Centers Zhen Jia 1,2, Lei Wang 1,2, Jianfeng Zhan 1*, Lixin Zhang 1, and Chunjie Luo 1 1 State Key Laboratory Computer Architecture, Institute of Computing