Facilitating Consistency Check between Specification & Implementation with MapReduce Framework

Similar documents
Big Data for Engineers Spring Resource Management

Distributed Face Recognition Using Hadoop

Performance Evaluation of A Testing Framework using QuickCheck and Hadoop

Big Data 7. Resource Management

MI-PDB, MIE-PDB: Advanced Database Systems

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Hadoop/MapReduce Computing Paradigm

CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH)

Your First Hadoop App, Step by Step

A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop

Clustering Lecture 8: MapReduce

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Top 25 Hadoop Admin Interview Questions and Answers

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Ghislain Fourny. Big Data 6. Massive Parallel Processing (MapReduce)

Mixing and matching virtual and physical HPC clusters. Paolo Anedda

Correlation based File Prefetching Approach for Hadoop

A brief history on Hadoop

Evaluation of Apache Hadoop for parallel data analysis with ROOT

Ghislain Fourny. Big Data Fall Massive Parallel Processing (MapReduce)

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Hortonworks HDPCD. Hortonworks Data Platform Certified Developer. Download Full Version :

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

Introduction to Hadoop and MapReduce

Dynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

Hadoop. copyright 2011 Trainologic LTD

Research and Realization of AP Clustering Algorithm Based on Cloud Computing Yue Qiang1, a *, Hu Zhongyu2, b, Lei Xinhua1, c, Li Xiaoming3, d

MapReduce, Hadoop and Spark. Bompotas Agorakis

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2

Hadoop MapReduce Framework

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]

MapReduce. U of Toronto, 2014

Distributed Systems. CS422/522 Lecture17 17 November 2014

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Introduction to MapReduce

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

Chapter 5. The MapReduce Programming Model and Implementation

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

1. Introduction (Sam) 2. Syntax and Semantics (Paul) 3. Compiler Architecture (Ben) 4. Runtime Environment (Kurry) 5. Testing (Jason) 6. Demo 7.

Hadoop-PR Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer)

BigData and Map Reduce VITMAC03

Lecture 11 Hadoop & Spark

Vendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam.

VIAF: Verification-based Integrity Assurance Framework for MapReduce. YongzhiWang, JinpengWei

Introduction to MapReduce. Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng.

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Hortonworks PR PowerCenter Data Integration 9.x Administrator Specialist.

Lecture 30: Distributed Map-Reduce using Hadoop and Spark Frameworks

SGI Hadoop Based on Intel Xeon Processor E5 Family. Getting Started Guide

Decision analysis of the weather log by Hadoop

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

KillTest *KIJGT 3WCNKV[ $GVVGT 5GTXKEG Q&A NZZV ]]] QORRZKYZ IUS =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX

Comparative Analysis of K means Clustering Sequentially And Parallely

An Enhanced Approach for Resource Management Optimization in Hadoop

HADOOP FRAMEWORK FOR BIG DATA

Parallel Genetic Algorithm to Solve Traveling Salesman Problem on MapReduce Framework using Hadoop Cluster

Distributed Filesystem

CS 345A Data Mining. MapReduce

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse

what is cloud computing?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

MapReduce-style data processing

PARALLELIZATION OF BACKWARD DELETED DISTANCE CALCULATION IN GRAPH BASED FEATURES USING HADOOP JAYACHANDRAN PILLAMARI. B.E., Osmania University, 2009

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

The MapReduce Framework

August Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC. Supported by the National Natural Science Fund

CS370 Operating Systems

Enhanced Hadoop with Search and MapReduce Concurrency Optimization

Introduction to MapReduce

2/26/2017. For instance, consider running Word Count across 20 splits

Vendor: Cloudera. Exam Code: CCD-410. Exam Name: Cloudera Certified Developer for Apache Hadoop. Version: Demo

itpass4sure Helps you pass the actual test with valid and latest training material.

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

Database Applications (15-415)

Big Data Management and NoSQL Databases

Guoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14.

VMware vsphere Big Data Extensions Command-Line Interface Guide

Programming Systems for Big Data

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Motivation. Map in Lisp (Scheme) Map/Reduce. MapReduce: Simplified Data Processing on Large Clusters

Inria, Rennes Bretagne Atlantique Research Center

Introduction to MapReduce

Howdah. a flexible pipeline framework and applications to analyzing genomic data. Steven Lewis PhD

Distributed Systems 16. Distributed File Systems II

CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu

TP1-2: Analyzing Hadoop Logs

SSS: An Implementation of Key-value Store based MapReduce Framework. Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh

L5-6:Runtime Platforms Hadoop and HDFS

VMware vsphere Big Data Extensions Command-Line Interface Guide

Getting Started with Spark

Introduction to Data Management CSE 344

Lecture 12 DATA ANALYTICS ON WEB SCALE

Next-Generation Cloud Platform

Transcription:

Facilitating Consistency Check between Specification & Implementation with MapReduce Framework Shigeru KUSAKABE, Yoichi OMORI, Keijiro ARAKI Kyushu University, Japan

2 Our expectation Light-weight formal methods will become more popular. One of key factors: platform to test specifications. We are in Cloud era, and emerging cloud computing technologies are effective in developing platforms for LWFM? One of the points: elastic environment for various conditions increasing # nodes if time constraint gets tight enabling brute-force approaches if necessary

3 Conceptual scenario Cloud Service for VDM: Executable specification Implementation Property to be checked Configuration test data volume resource scale Cloud Service for VDM Report

4 Conceptual scenario Losing consistency between specification and implementation? Checking and analyzing consistency (extensively) between intentional world and extensional world to understand the current status, fix the situation / establish the final specification if necessary. (ill-managed) specification for development source code document / specification for maintenance

5 Bad news for our extensive approach execution time on single node [sec] 20000 15000 10000 5000 0 0 200 400 600 800 1000 # of tests 5 HIPS 2011

6 Introduce MapRedeuce to testing specification Can we use cloud technologies for expensive tests? MapReduce framework: Parallel & distributed processing without knowledge of parallelization, fault tolerance, data distribution, load balancing Prepare input file, mapper and reducer functions, then MapReduce system splits input file, distributes over multiple nodes, gathers and folds results.

7 Testing property for a set of data As principal abstractions, we often use sets, sequences, and mappings, in VDM. e.g. test scenario : property for a set of data exists x in set s & P(x) OR mapper P(x 0 ) P(),... reducer P(x k )... OR forall x in set s & P(x) AND mapper P(xP(), 0 )... reducer P(x k ) AND...

8 Key components in our approach Test data generation: QuickCheck, Map in Hadoop Test execution: VDMTools, Map in Hadoop Making summary reports: VDMTools, Reduce in Hadoop Hadoop: Open Source implementation of MapReduce in Java, including Distributed File System (HDFS),. In default, we write mapper and reducer in Java, Streaming allows other languages (scripts for VDM*) QuickCheck: Test tool for Haskell write property expression system must keep generate test cases based on the type of property & constraints can use to generate test data in other language than Haskell

9 Test with property-based data generation Generator: based on QuickCheck customized for distributed execution Side outputs: test coverage data, error info., 9

10 Model-based test Compare the results of VDM specification and corresponding implementation Generation or reuse of test data on HDFS property 1 VDM Specification Test Data Implementation Run VDM specification 2 3 Expected Output Actual Output Run implementation 4 Compare results Test Result

11 Performance evaluation Property-based test (with data generation) Model-based test (comparison of model & implementation) Configuration Master nodes:jobtracker, NameNode Slave nodes:(tasktracker + DataNode) X 1 8 NameNode JobTracker TaskTracker+ DataNode CPU Xeon E5420 2.50GHz Quad Core Xeon E5420 2.50GHz Quad Core Xeon X3320 2.5GHz Quad Core Memory 3.2GB 8.0GB 3.2GB Disk 2TB 1TB 140GB NIC 1Gbps 1Gbps 1Gbps

Property-based test - time 12

Model-based test - time 13

14 Concluding remarks Our work-in-progress report Parallel processing framework, Hadoop, seems useful for speeding up testing specifications in light-weight F.M. Develop specifications, and their extensive tests, in order to increase confidence (instead of proof) Different testing schemes by combinations of map/reduce e.g. Property-based test, Model-based test, Predictable performance in software projects, it seems hard to predict most cost-effective configuration performance can degrade in elastic cloud (virtual) environment