大数据基准测试 : 原理 方法和应用 詹剑锋 http://prof.ict.ac.cn/bigdatabench 中国科学院计算技术研究所中国科学院大学 2015.7.31 2015 可信云服务大会, 北京 INSTITUTE OF COMPUTING TECHNOLOGY
Outline 原理 方法 BigDataBench
计量的意义 科学和人类日常生活的基础 牛顿 ( 力 ) 开尔文 ( 温度 ) 瓦特 ( 功率 )
开尔文名言 If you can t measure it, you can t improve it. 无法计量, 就无法改进!
大数据基准测试的本质 造一把量大数据系统的尺子 不幸的是 系统太复杂 应用太多样 指标不直观
什么是基准测试程序? The process of running a specific program or workload on a specific machine or system and measuring the resulting performance. Saavedra, R. H., Smith, A. J.: Analysis of benchmark characteristics and benchmark performance prediction, ACM Transactions on Computer System, vol. 14, no. 4, (1996) 344-384
什么是基准测试程序套件? A popular measure of performance with a variety of applications To overcome the danger of placing too many eggs in one basket the weakness of any one benchmark is lessened by the presence of the other benchmarks characterize the relative performance e.g. EEMBC, SPEC -- Computer architecture: a quantitative approach
基准测试程序原理 Few explicitly discusses benchmarking principles After-thought However, implicit principles indeed exist.
怎样才算一个好的基准测试程序? Relevant Good Benchmark Portable Scalable Simple
TPC 基准测试程序原理 Relevant meaningful within the target domain Simple Good metric(s) linear, orthogonal, monotonic Portable applicable to a broad spectrum of hardware/architecture Coverage does not oversimplify the typical environment Acceptance Vendors and Users embrace it -- Charles Levine: TPC-C: The OLTP Benchmark, Sigmod, 1997
SPEC 原理 SPEC: Systems Performance Evaluation Cooperative Application-oriented test real-life situations Portability written in a platform neutral programming language Repeatable and reliable Consistency and fairness each specification must define clear rules for executing and reporting results -- Renzo Angles: Benchmark principles and Methods
基准测试利弊 Good benchmarks Define the playing field Accelerate progress Engineers do a great job once objective is measurable and repeatable Set the performance agenda Measure release-to-release progress Benchmark abuse Benchmarketing Benchmark wars more $ on ads than development -- TPC Benchmarks: talked by Charles Levine at 1997
大数据基准测试挑战 One-size-fits-all vs. one-size-fits-a-bunch Hardware: General-purpose vs. specific-purpose Data management OLTP, NoSQL, DW, offline/interactive analytic, streaming Diverse/representative vs. benchmark cost Open problem Increasing workloads, data, and software stacks. Simple (understandable) vs. complex Specific vs. abstract Semantic-specific
提纲 原理 方法 BigDataBench
基准测试构造方法 Top-down: representative program selection can yield accurate representations of the program space of interest usually impossible to make any form of hard statements about the representativeness Bottom-up: diverse range of characteristics program characteristics are quantities that can be measured and compared not all portions of the characteristics space are equally important -- C. Bienia. Benchmarking modern multiprocessors. PhD thesis, Princeton University, 2011.
TPC 功能负载模型 Application domain encapsulate user cases Functions of abstraction abstraction of the implementations of use cases in different application domains. Systems View and Physical View Different systems and hardware -- Yanpei Chen, Francois Raab, Randy Katz: From TPC-C to Big Data Benchmarks: A Functional Workload Model, WBDB, 2012
功能负载模型 Functional view enables a large range of similarly targeted systems to be compared allows the benchmark to scale and evolve
TPC-C 方法学 Functions of Abstraction a mid-weight read-write trans- action (i.e., New-Order) a light-weight read-write transaction (i.e., Payment) a mid-weight read-only transaction (i.e., Order-Status) a batch of mid-weight read-write transactions (i.e., Delivery) a heavy-weight read-only transaction (i.e., Stock-Level) Functional Workload Model captures in an implementation-independent manner the load that the system needs to service
结构化数据的关系模型 E. F. Codd, A relational Model of Data for Large shared data banks. Communication of ACM, vol 13. no.6, 1970. Set concept : general mathematical meaning General representation of data Basis of relational algebra (theoretical foundation of database) 5 basic operations Select, Project, Product, Union, Difference
并行计算抽象 By a multidisciplinary group of well-known researchers e.g.: Jim Gray, Michael Jordan David A. Patterson Operations & Patterns Abstracted from 13 representative parallel computation patterns Parallel computation inherent demand for big data processing (volume & complexity)
提纲 原理 方法 BigDataBench
什么是 BigDataBench? An open source big data benchmarking project http://prof.ict.ac.cn/bigdatabench Search Google using BigDataBench
BigDataBench 3.1 概况 BDGS(Big Data Generator Suite) for scalable data Wikipedia Entries Amazon Movie Reviews Google Web Graph Facebook Social Network E-commerce Transaction ImageNet English broadcasting audio ProfSearch Resumes DVD Input Streams Image scene SoGou Data Genome sequence data Assembly of the human genome MNIST 14 Real-world Data Sets Search Engine Multimedia Social E-commerce Network Bioinformatics 33 Workloads NoSql Impala Shark Hadoop RDMA MPI DataMPI Software Stacks
为什么要使用 BigDataBench? Specifi cation Application domains Workload Types Work loads Scalable data sets (from real data) Multiple impleme ntations Multite nancy BigDataBench Y Five Four [1] 33 8 Y Y Y Y Subs ets Simulat or version BigBench Y One Three 10 3 N N N N Cloud-Suite N N/A Two 8 3 N N N Y HiBench N N/A Two 10 3 N N N N CALDA Y N/A One 5 N/A Y N N N YCSB Y N/A One 6 N/A Y N N N LinkBench Y N/A One 10 N/A Y N N N AMP Benchmarks Y N/A One 4 N/A Y N N N [1] The four workloads types include Offline Analytics, Cloud OLTP, Interactive Analytics and Online Service
BigDataBench 用户 http://prof.ict.ac.cn/bigdatabench/users/ Industry users Accenture, BROADCOM, SAMSUMG, Huawei, IBM About 20 academia groups published papers using BigDataBench BigDataBench support for Flink
工业标准 :BigDataBench-DCA China s first industry-standard big data benchmark suite http://prof.ict.ac.cn/bigdatabench/industrystandard-benchmarks/ Telecom Research Institute of Ministry of Industry and Information Technology, ICT, CAS, Huawei, China Mobile, Sina, ZTE, Intel (China), Microsoft (China), IBM CDL, Baidu, INSPUR, ZTE, 21viane and UCloud
BigDataBench 论文和技术报告 BigDataBench: a Big Data Benchmark Suite from Internet Services. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014). Characterizing data analysis workloads in data centers. 2013 IEEE International Symposium on Workload Characterization (IISWC 2013)(Best paper award) BigOP: generating comprehensive big data workloads as a benchmarking framework. 19th International Conference on Database Systems for Advanced Applications (DASFAA 2014) BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. The Fourth workshop on big data benchmarking (WBDB 2014) Identifying Dwarfs Workloads in Big Data Analytics arxiv preprint arxiv:1505.06872 BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads arxiv preprint arxiv:1504.02205
BigDataBench 原理 Data-centric supporting different types of raw data Application-centric Independent on specific HW/SW components Coverage representative workloads reflect diversity of application scenarios Representative software stacks Scalable & Extensible Easy to add new workloads and support new software stacks Usability: Easy to deploy, configure and run for users
BigDataBench 方法学 Application Domain 1 Benchmark specification 1 Real-world data sets Multi-tenancy version Application Domain Data models of different types & semantics Data operations & workloads patterns Benchmark specification Data generation tools Mix with different percentages Reduce benchmarking cost Application Domain N Benchmark specification N Workloads with diverse implementations BigDataBench subset
Nucleotides (billion) Search Engine 200 Electronic Commerce new 180Others VIDEOS 160 on YouTube 15% 5% every minute 140 120 100 80 15% 五个应用领域 DDBJ/EMBL/GenBank database Growth Nucleotides Entries Internet Service Search engine, Social network, E-commerce Social Network Media Streaming hours MUSIC streaming on PANDORA 40% every minute 25% 60 data growth VIDEO 40 feeds from 40 minutes Bioinformatics VOICE calls on are surveillance 20 cameras 20 Skype every minute 0 Top 20 websites uments, 0 http://www.oldcolony.us/wp-content/uploads/2014/11/whatisbigdata-dkb-v2.pdf http://www.alexa.com/topsites/global;0 http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth-e.html#dbgrowth-graph Taking up 80% of internet services according to page 180 new views and daily visitors 160 PHOTOS on FLICKR every 140 minute Multimedia 120 100 80 60 IMAGES, VIDEOS, doc Entries (million)
BigDataBench 方法学 Application Domain 1 Benchmark specification 1 Real-world data sets Multi-tenancy version Application Domain Data models of different types & semantics Data operations & workloads patterns Benchmark specification Data generation tools Mix with different percentages Reduce benchmarking cost Application Domain N Benchmark specification N Workloads with diverse implementations BigDataBench subset
大数据分析的小矮人 A minimum set to represent maximum patterns of big data analytics
大数据离线分析小矮人 Linear Algebra Sampling Transform operation Graph operation Logic operation Set operation Statistic operation Sort
类 DAG 组合 Feature extraction SIFT Algorithm
负载和数据集 Structured Semi-Structured Unstructured Text Graph Table Multimedia Data Model Semantics Data Operations Workload Patterns Unit of computation Different combination of units of computation
BigDataBench 方法学 Application Domain 1 Benchmark specification 1 Real-world data sets Multi-tenancy version Application Domain Data models of different types & semantics Data operations & workloads patterns Benchmark specification Data generation tools Mix with different percentages Reduce benchmarking cost Application Domain N Benchmark specification N Workloads with diverse implementations BigDataBench subset
基准测试规约 Guidelines for BigDataBench implementation Data model workloads Describe data model Model typical application scenarios Extract important workloads
规约 1 搜索引擎 General search and vertical search Online server and Offline analytics
规约 多媒体 Voice Data Extraction Speech Recognition Video Data MPEG Decoder Frame Data Extraction Feature Extraction Image Segmentation Face Detection Three- Dimensional Reconstruction Tracing
BigDataBench 方法学 Application Domain 1 Benchmark specification 1 Real-world data sets Multi-tenancy version Application Domain Data models of different types & semantics Data operations & workloads patterns Benchmark specification Data generation tools Mix with different percentages Reduce benchmarking cost Application Domain N Benchmark specification N Workloads with diverse implementations BigDataBench subset
大数据生成工具套件 3 kinds of big data generators Preserving original characteristics of real data Text/Graph/Table generator
BigDataBench 方法学 Application Domain 1 Benchmark specification 1 Real-world data sets Multi-tenancy version Application Domain Data models of different types & semantics Data operations & workloads patterns Benchmark specification Data generation tools Mix with different percentages Reduce benchmarking cost Application Domain N Benchmark specification N Workloads with diverse implementations BigDataBench subset
BigDataBench 多租户版本 Scenarios of multiple tenants running heterogeneous applications in cloud datacenters Latency-critical online services Latency-insensitive offline batch applications Benchmarking scenarios Mining real-world Workload traces (Google and Facebook) Profiling Realworld Workload traces Workload matching using Machine learning techniques Parametric workload generation tool Mixed workloads in public clouds Data analytical workloads in private clouds
BigDataBench 子集 Motivation It is expensive to run all the benchmarks for system and architecture researches multiplied by different implementations BigDataBench 3.0 provides about 77 workloads Eliminate the correlation data Identify workload characteristics from a specific perspective Dimension reduction (PCA) Clustering (K-Means) Subset
正在进行的工作 Streaming With ECNU, Renming Univeristy of China
结论 回顾和总结基准测试程序原理和方法 介绍一个开源的大数据基准测试程序 -- -BigDataBench