Modeling and evaluation on Ad hoc query processing with Adaptive Index in Map Reduce Environment
|
|
- Elvin Fowler
- 5 years ago
- Views:
Transcription
1 DEIM Forum 213 F2-1 Adaptive indexing MapReduce MapReduce MapReduce Modeling and evaluation on Ad hoc query processing with Adaptive Index in Map Reduce Environment Shohei OKUDERA, Daisaku YOKOYMA, Miyuki NAKANO, and Masaru KITSUREGAWA Abstract Institute of Industrial Science the University of Tokyo Komaba Meguro-ku Tokyo Japan {okudera,yokoyama,miyuki,kitsure}@tkl.iis.u-tokyo.ac.jp We are required to process large-scale data efficiently as the volume of digital data we generate increases rapidly. MapReduce environment is suitable for large-scale distributed data processing and getting more important as a ad-hoc data analysis platform. However, all data scan is required every query when performing ad-hoc analysis in MapReduce environment, so the I/O cost will be badly huge. In this paper, in order to reduce the I/O cost, we propose an integrated MapReduce framework with adaptive index. This framework gradually shrinks the processing cost, as those queries are issued, each of which has a selection condition on the same column but different selection values. And the framework can keep the performance when data updates happens. We applied our proposed adaptive indexing technique into basic data operations projection, aggregation, join and evaluated the reduction of the execution time. We conducted the simulation and compared the accumulated execution time in the workload where the queries with the selection conditions slightly changed are repeatedly issued. As a result of simulation, our proposed framework showed higher performance than those processing without adaptive indexing. Key words MapReduce Adaptive indexing Ad-hoc query Simulation 1. SNS Facebook [1], [2], ebay 2 3 5
2 Web. MapReduce [3] Hadoop [4] MapReduce [5] MapReduce MapReduce MapReduce 2 MapReduce 3 4 MapReduce MapReduce MapReduce 3 discount (1) SELECT id, discount FROM R WHERE x <= discount < y ; (2) (3) SELECT S. partid, S. partname, R. discount FROM R, S WHERE x <= R. discount < y AND R. partid = S. partid 2. 1 Hadoop MapReduce. MapReduce MapReduce Map Reduce 2 Map Hadoop HDFS map Reduce Map Reduce HDFS 1 Map 2 < = discount < 8 Map Map Map Collect Spill Merge Reduce. Reduce Map Reduce HDFS < = discount < 5 MapReduce SELECT partid, AVE( d i s c o u n t ) FROM R WHERE x <= discount < y GROUP BY parrid
3 3. MapReduce Adaptive index [6], [7] MapReduce Adaptive indexing I/O MapReduce Map Map Map Map HDFS HDFS Adaptive indexing Database Cracking [6] Map Map 2 < = discount < 8 discount < 2, 2 < = discount < 8, 8 < = discount 3 2, 8 ) HDFS
4 Map Map HDFS HDFS all-delete HDFS split-deleteHDFS HDFS hash hash hash hash 5 3.partial-update HDFS a ) Map b )
5 4. [8] Hadoop [8] I/O CPU Map 1 Scan,Map, Collect,Spill,Merge 5 Map Scan I/O CPU Intel Xeon E GHz 2P/8C DDR3 24GB, HDD 1 HDD 115MBps OS Linux2.6.32, Hadoop.2.3 1GBps MapRedcue R 1TB S 1GB R id S partid 2 2 x, y.1% 8% x, y.1% discount MapReduce Map 512MB, Reduce (1) 6 6 (no-index) ,8 (adaptiveindex) , 1 5% 2
6 no-index adaptive index (2) 1% 7 (no-index) , ,24 1 5% no-index adaptive index (3) (3) R S 8 (no-index) , , % no-index adaptive index R 1TB 5 2 ID 9 8%.1% 9 (no-index) (partial-update) split-delete
7 (partial-update) (partial-update) no-index all-delete split-delete partial-update [9] with-value 1stQuery sort 1stQuery index pointer MapReduce with-value 1stQuery sort 1stQuery index pointer
8 with-value 1stQuery sort 1stQuery index pointer 6. [9] [12] Dittrich [11] IO 12 Ritcher [9] Adaptive index [6], [7] Dremel [13] Dremel Dremel MapReduce I/O Map [1] M. Zuckerberg, One billion people on facebook, Oct One Billion People on Facebook [2] Big data: The next frontier for innovation, competition, and productivity, 211. [3] J. Dean and S. Ghemawat, Mapreduce: simplified data processing on large clusters, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, pp.1 1, OSDI 4, USENIX Association, 24. [4] hadoop [5] A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H. Liu, Data warehousing and analytics infrastructure at facebook, Proceedings of the 21 ACM SIGMOD International Conference on Management of data, pp , SIGMOD 1, ACM, 21. [6] S. Idreos, M.L. Kersten, and S. Manegold, Database cracking, CIDR, pp.68 78, 27. [7] S. Idreos, M.L. Kersten, and S. Manegold, Self-organizing tuple reconstruction in column-stores, Proceedings of the 35th SIGMOD international conference on Management of data, pp , SIGMOD 9, ACM, New York, NY, USA, [8] H. Herodotou, Hadoop performance models, Dec [9] S. Richter, J.-A. Quiané-Ruiz, S. Schuh, and J. Dittrich, Towards zero-overhead adaptive indexing in hadoop, 212. [1] D. Jiang, B.C. Ooi, L. Shi, and S. Wu, The performance of mapreduce: an in-depth study, Proc. VLDB Endow., vol.3, no.1-2, pp , Sept. 21. [11] J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad, Hadoop++: making a yellow elephant run like a cheetah (without it even noticing), Proc. VLDB Endow., vol.3, no.1-2, pp , Sept [12] J. Dittrich, J.-A. Quiané-Ruiz, S. Richter, S. Schuh, A. Jindal, and J. Schad, Only aggressive elephants are fast elephants, Proc. VLDB Endow., vol.5, no.11, pp , July [13] S. Melnik, A. Gubarev, J.J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis, Dremel: Interactive analysis of web-scale datasets, Proc. of the 36th Int l Conf on Very Large Data Bases, pp , MapRedcue
MINING OF LARGE SCALE DATA USING BESTPEER++ STRATEGY
MINING OF LARGE SCALE DATA USING BESTPEER++ STRATEGY *S. ANUSUYA,*R.B. ARUNA,*V. DEEPASRI,**DR.T. AMITHA *UG Students, **Professor Department Of Computer Science and Engineering Dhanalakshmi College of
More informationMulti-indexed Graph Based Knowledge Storage System
Multi-indexed Graph Based Knowledge Storage System Hongming Zhu 1,2, Danny Morton 2, Wenjun Zhou 3, Qin Liu 1, and You Zhou 1 1 School of software engineering, Tongji University, China {zhu_hongming,qin.liu}@tongji.edu.cn,
More informationLeanBench: comparing software stacks for batch and query processing of IoT data
Available online at www.sciencedirect.com Procedia Computer Science (216) www.elsevier.com/locate/procedia The 9th International Conference on Ambient Systems, Networks and Technologies (ANT 218) LeanBench:
More informationLoad Balancing Through Map Reducing Application Using CentOS System
Load Balancing Through Map Reducing Application Using CentOS System Nidhi Sharma Research Scholar, Suresh Gyan Vihar University, Jaipur (India) Bright Keswani Associate Professor, Suresh Gyan Vihar University,
More informationTHE data generated and stored by enterprises are in the
EQUI-DEPTH HISTOGRAM CONSTRUCTION FOR BIG DATA WITH QUALITY GUARANTEES 1 Equi-depth Histogram Construction for Big Data with Quality Guarantees Burak Yıldız, Tolga Büyüktanır, and Fatih Emekci arxiv:166.5633v1
More informationLarge Scale OLAP. Yifu Huang. 2014/11/4 MAST Scientific English Writing Report
Large Scale OLAP Yifu Huang 2014/11/4 MAST612117 Scientific English Writing Report 2014 1 Preliminaries OLAP On-Line Analytical Processing Traditional solutions: data warehouses built by parallel databases
More informationCIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu
CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin Presented by: Suhua Wei Yong Yu Papers: MapReduce: Simplified Data Processing on Large Clusters 1 --Jeffrey Dean
More informationQuery processing on raw files. Vítor Uwe Reus
Query processing on raw files Vítor Uwe Reus Outline 1. Introduction 2. Adaptive Indexing 3. Hybrid MapReduce 4. NoDB 5. Summary Outline 1. Introduction 2. Adaptive Indexing 3. Hybrid MapReduce 4. NoDB
More informationPerformance Comparison of Hive, Pig & Map Reduce over Variety of Big Data
Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data Yojna Arora, Dinesh Goyal Abstract: Big Data refers to that huge amount of data which cannot be analyzed by using traditional analytics
More informationImproved MapReduce k-means Clustering Algorithm with Combiner
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationPivoting Entity-Attribute-Value data Using MapReduce for Bulk Extraction
Pivoting Entity-Attribute-Value data Using MapReduce for Bulk Extraction Augustus Kamau Faculty Member, Computer Science, DeKUT Overview Entity-Attribute-Value (EAV) data Pivoting MapReduce Experiment
More informationJumbo: Beyond MapReduce for Workload Balancing
Jumbo: Beyond Reduce for Workload Balancing Sven Groot Supervised by Masaru Kitsuregawa Institute of Industrial Science, The University of Tokyo 4-6-1 Komaba Meguro-ku, Tokyo 153-8505, Japan sgroot@tkl.iis.u-tokyo.ac.jp
More informationParallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce
Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore huwu@i2r.a-star.edu.sg Abstract. Processing XML queries over
More informationEvaluation of Multiple Fat-Btrees on a Parallel Database
DEIM Forum 2012 D10-3 Evaluation of Multiple Fat-Btrees on a Parallel Database Min LUO Takeshi MISHIMA Haruo YOKOTA Dept. of Computer Science, Tokyo Institute of Technology 2-12-1 Ookayama, Meguro-ku,
More informationRecurring Job Optimization for Massively Distributed Query Processing
Recurring Job Optimization for Massively Distributed Query Processing Nicolas Bruno nicolasb@microsoft.com Microsoft Corp. Sapna Jain sapnakjain@gmail.com IIT Bombay Jingren Zhou jrzhou@microsoft.com Microsoft
More informationOptimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*
Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating
More informationDremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis Presented by: Sameer Agarwal sameerag@cs.berkeley.edu
More informationSURVEY ON BIG DATA TECHNOLOGIES
SURVEY ON BIG DATA TECHNOLOGIES Prof. Kannadasan R. Assistant Professor Vit University, Vellore India kannadasan.r@vit.ac.in ABSTRACT Rahis Shaikh M.Tech CSE - 13MCS0045 VIT University, Vellore rais137123@gmail.com
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data
More informationCombining MapReduce with Parallel DBMS Techniques for Large-Scale Data Analytics
EDIC RESEARCH PROPOSAL 1 Combining MapReduce with Parallel DBMS Techniques for Large-Scale Data Analytics Ioannis Klonatos DATA, I&C, EPFL Abstract High scalability is becoming an essential requirement
More informationIntroduction to MapReduce Algorithms and Analysis
Introduction to MapReduce Algorithms and Analysis Jeff M. Phillips October 25, 2013 Trade-Offs Massive parallelism that is very easy to program. Cheaper than HPC style (uses top of the line everything)
More informationMap/Reduce. Large Scale Duplicate Detection. Prof. Felix Naumann, Arvid Heise
Map/Reduce Large Scale Duplicate Detection Prof. Felix Naumann, Arvid Heise Agenda 2 Big Data Word Count Example Hadoop Distributed File System Hadoop Map/Reduce Advanced Map/Reduce Stratosphere Agenda
More informationBig Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012
Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema
More informationSMCCSE: PaaS Platform for processing large amounts of social media
KSII The first International Conference on Internet (ICONI) 2011, December 2011 1 Copyright c 2011 KSII SMCCSE: PaaS Platform for processing large amounts of social media Myoungjin Kim 1, Hanku Lee 2 and
More informationDelft University of Technology Parallel and Distributed Systems Report Series
Delft University of Technology Parallel and Distributed Systems Report Series An Empirical Performance Evaluation of Distributed SQL Query Engines: Extended Report Stefan van Wouw, José Viña, Alexandru
More informationResource and Performance Distribution Prediction for Large Scale Analytics Queries
Resource and Performance Distribution Prediction for Large Scale Analytics Queries Prof. Rajiv Ranjan, SMIEEE School of Computing Science, Newcastle University, UK Visiting Scientist, Data61, CSIRO, Australia
More informationCamdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa
Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file
More informationProcessing Load Prediction for Parallel FP-growth
DEWS2005 1-B-o4 Processing Load Prediction for Parallel FP-growth Iko PRAMUDIONO y, Katsumi TAKAHASHI y, Anthony K.H. TUNG yy, and Masaru KITSUREGAWA yyy y NTT Information Sharing Platform Laboratories,
More informationcalled Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil
Parallel Genome-Wide Analysis With Central And Graphic Processing Units Muhamad Fitra Kacamarga mkacamarga@binus.edu James W. Baurley baurley@binus.edu Bens Pardamean bpardamean@binus.edu Abstract The
More informationP-Codec: Parallel Compressed File Decompression Algorithm for Hadoop
P-Codec: Parallel Compressed File Decompression Algorithm for Hadoop ABSTRACT Idris Hanafi and Amal Abdel-Raouf Computer Science Department, Southern Connecticut State University, USA Computers and Systems
More informationExploiting Bloom Filters for Efficient Joins in MapReduce
Exploiting Bloom Filters for Efficient Joins in MapReduce Taewhi Lee, Kisung Kim, and Hyoung-Joo Kim School of Computer Science and Engineering, Seoul National University 1 Gwanak-ro, Seoul 151-742, Republic
More informationPSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department
More informationDremel: Interactice Analysis of Web-Scale Datasets
Dremel: Interactice Analysis of Web-Scale Datasets By Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis Presented by: Alex Zahdeh 1 / 32 Overview
More informationGlobal Journal of Engineering Science and Research Management
A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV
More informationSQL-to-MapReduce Translation for Efficient OLAP Query Processing
, pp.61-70 http://dx.doi.org/10.14257/ijdta.2017.10.6.05 SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce Hyeon Gyu Kim Department of Computer Engineering, Sahmyook University,
More informationHash Joins for Multi-core CPUs. Benjamin Wagner
Hash Joins for Multi-core CPUs Benjamin Wagner Joins fundamental operator in query processing variety of different algorithms many papers publishing different results main question: is tuning to modern
More informationSQL Query Optimization on Cross Nodes for Distributed System
2016 International Conference on Power, Energy Engineering and Management (PEEM 2016) ISBN: 978-1-60595-324-3 SQL Query Optimization on Cross Nodes for Distributed System Feng ZHAO 1, Qiao SUN 1, Yan-bin
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK EFFICIENT DATA PROCESSING IN PEER NETWORK USING CLOUD COMPUTING SHILPA VASANTRAO
More informationBig Data Hive. Laurent d Orazio Univ Rennes, CNRS, IRISA
Big Data Hive Laurent d Orazio Univ Rennes, CNRS, IRISA 2018-2019 Outline I. Introduction II. Data model III. Type system IV. Language 2018/2019 Hive 2 Outline I. Introduction II. Data model III. Type
More informationAn Efficient Distributed B-tree Index Method in Cloud Computing
Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 214, 8, 32-38 32 Open Access An Efficient Distributed B-tree Index Method in Cloud Computing Huang Bin 1,*
More informationSSD. DEIM Forum 2014 D8-6 SSD I/O I/O I/O HDD SSD I/O
DEIM Forum 214 D8-6 SSD, 153 855 4-6-1 135 8548 3-7-5 11 843 2-1-2 E-mail: {keisuke,haya,yokoyama,kitsure}@tkl.iis.u-tokyo.ac.jp, miyuki@sic.shibaura-it.ac.jp SSD SSD HDD 1 1 I/O I/O I/O I/O,, OLAP, SSD
More informationA What-if Engine for Cost-based MapReduce Optimization
A What-if Engine for Cost-based MapReduce Optimization Herodotos Herodotou Microsoft Research Shivnath Babu Duke University Abstract The Starfish project at Duke University aims to provide MapReduce users
More informationDCODE: A Distributed Column-Oriented Database Engine for Big Data Analytics
DCODE: A Distributed Column-Oriented Database Engine for Big Data Analytics Yanchen Liu, Fang Cao, Masood Mortazavi, Mengmeng Chen, Ning Yan, Chi Ku, Aniket Adnaik, Stephen Morgan, Guangyu Shi, Yuhu Wang,
More informationABSTRACT I. INTRODUCTION
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve
More informationParallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem
I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **
More informationMRBench : A Benchmark for Map-Reduce Framework
MRBench : A Benchmark for Map-Reduce Framework Kiyoung Kim, Kyungho Jeon, Hyuck Han, Shin-gyu Kim, Hyungsoo Jung, Heon Y. Yeom School of Computer Science and Engineering Seoul National University Seoul
More informationTwitter data Analytics using Distributed Computing
Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE
More informationJoin Processing for Flash SSDs: Remembering Past Lessons
Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison $/MB GB Flash Solid State Drives (SSDs) Benefits of
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationIntroduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)
Introduction to MapReduce Adapted from Jimmy Lin (U. Maryland, USA) Motivation Overview Need for handling big data New programming paradigm Review of functional programming mapreduce uses this abstraction
More informationMapReduce-II. September 2013 Alberto Abelló & Oscar Romero 1
MapReduce-II September 2013 Alberto Abelló & Oscar Romero 1 Knowledge objectives 1. Enumerate the different kind of processes in the MapReduce framework 2. Explain the information kept in the master 3.
More informationComparative Analysis of Range Aggregate Queries In Big Data Environment
Comparative Analysis of Range Aggregate Queries In Big Data Environment Ranjanee S PG Scholar, Dept. of Computer Science and Engineering, Institute of Road and Transport Technology, Erode, TamilNadu, India.
More informationAdaptive Join Plan Generation in Hadoop
Adaptive Join Plan Generation in Hadoop For CPS296.1 Course Project Gang Luo Duke University Durham, NC 27705 gang@cs.duke.edu Liang Dong Duke University Durham, NC 27705 liang@cs.duke.edu ABSTRACT Joins
More informationThings To Know. When Buying for an! Alekh Jindal, Jorge Quiané, Jens Dittrich
7 Things To Know When Buying for an! Alekh Jindal, Jorge Quiané, Jens Dittrich 1 What Shoes? Why Shoes? 3 Analyzing MR Jobs (HadoopToSQL, Manimal) Generating MR Jobs (PigLatin, Hive) Executing MR Jobs
More informationDremel: Interactive Analysis of Web- Scale Datasets
Dremel: Interactive Analysis of Web- Scale Datasets S. Melnik, A. Gubarev, J. Long, G. Romer, S. Shivakumar, M. Tolton Google Inc. VLDB 200 Presented by Ke Hong (slide adapted from Melnik s) Outline Problem
More informationImplementation of Aggregation of Map and Reduce Function for Performance Improvisation
2016 IJSRSET Volume 2 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Implementation of Aggregation of Map and Reduce Function for Performance Improvisation
More informationAndrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs
Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why
More informationSurvey Paper on Traditional Hadoop and Pipelined Map Reduce
International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,
More informationclass 5 column stores 2.0 prof. Stratos Idreos
class 5 column stores 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ worth thinking about what just happened? where is my data? email, cloud, social media, can we design systems
More informationClash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics
Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Presented by: Dishant Mittal Authors: Juwei Shi, Yunjie Qiu, Umar Firooq Minhas, Lemei Jiao, Chen Wang, Berthold Reinwald and Fatma
More informationA Survey: Dynamic Allocation of Slots for Map Reduce Workload.
ISSN 2395-1621 A Survey: Dynamic Allocation of Slots for Map Reduce Workload. #1 Dhananjay A. Ahire, #2 Narayan K. Dongare, #3 Pooja P. Wankhede, #4 Chiranjivi D. Mane, #5 Prof. Devendra Gadekar 1 ahire.dhunanjay12@gmail.com
More informationANALYZING CHARACTERISTICS OF PC CLUSTER CONSOLIDATED WITH IP-SAN USING DATA-INTENSIVE APPLICATIONS
ANALYZING CHARACTERISTICS OF PC CLUSTER CONSOLIDATED WITH IP-SAN USING DATA-INTENSIVE APPLICATIONS Asuka Hara Graduate school of Humanities and Science Ochanomizu University 2-1-1, Otsuka, Bunkyo-ku, Tokyo,
More informationApache Drill. Interactive Analysis of Large-Scale Datasets. Tomer Shiran
Apache Drill Interactive Analysis of Large-Scale Datasets Tomer Shiran Latency Matters Ad-hoc analysis with interactive tools Real-time dashboards Event/trend detection Network intrusions Fraud Failures
More informationJournal of East China Normal University (Natural Science) Data calculation and performance optimization of dairy traceability based on Hadoop/Hive
4 2018 7 ( ) Journal of East China Normal University (Natural Science) No. 4 Jul. 2018 : 1000-5641(2018)04-0099-10 Hadoop/Hive 1, 1, 1, 1,2, 1, 1 (1., 210095; 2., 210095) :,, Hadoop/Hive, Hadoop/Hive.,,
More informationA Cloud Storage Adaptable to Read-Intensive and Write-Intensive Workload
DEIM Forum 2011 C3-3 152-8552 2-12-1 E-mail: {nakamur6,shudo}@is.titech.ac.jp.,., MyCassandra, Cassandra MySQL, 41.4%, 49.4%.,, Abstract A Cloud Storage Adaptable to Read-Intensive and Write-Intensive
More informationMapReduce: A Programming Model for Large-Scale Distributed Computation
CSC 258/458 MapReduce: A Programming Model for Large-Scale Distributed Computation University of Rochester Department of Computer Science Shantonu Hossain April 18, 2011 Outline Motivation MapReduce Overview
More informationGuoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14.
Guoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14 Page 1 Introduction & Notations Multi-Job optimization Evaluation Conclusion
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationOn The Fly Mapreduce Aggregation for Big Data Processing In Hadoop Environment
ISSN (e): 2250 3005 Volume, 07 Issue, 07 July 2017 International Journal of Computational Engineering Research (IJCER) On The Fly Mapreduce Aggregation for Big Data Processing In Hadoop Environment Ms.
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 2. MapReduce Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Framework A programming model
More informationI. Introduction. FlashQueryFile: Flash-Optimized Layout and Algorithms for Interactive Ad Hoc SQL on Big Data Rini T Kaushik 1
FlashQueryFile: Flash-Optimized Layout and Algorithms for Interactive Ad Hoc SQL on Big Data Rini T Kaushik 1 1 IBM Research - Almaden Abstract High performance storage layer is vital for allowing interactive
More informationReplica Parallelism to Utilize the Granularity of Data
Replica Parallelism to Utilize the Granularity of Data 1st Author 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl. country code 1st author's E-mail address 2nd Author
More informationAdaptive Cache Mode Selection for Queries over Raw Data
Adaptive Cache Mode Selection for Queries over Raw Data Tahir Azim, Azqa Nadeem and Anastasia Ailamaki École Polytechnique Fédérale de Lausanne TU Delft RAW Labs SA tahir.azim@epfl.ch, A.Nadeem@student.tudelft.nl,
More informationA Comparison of MapReduce Join Algorithms for RDF
A Comparison of MapReduce Join Algorithms for RDF Albert Haque 1 and David Alves 2 Research in Bioinformatics and Semantic Web Lab, University of Texas at Austin 1 Department of Computer Science, 2 Department
More informationclass 17 updates prof. Stratos Idreos
class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing selects
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More informationModelling of Map Reduce Performance for Work Control and Supply Provisioning
Modelling of Map Reduce Performance for Work Control and Supply Provisioning Palson Kennedy.R* CSE&PERIIT, Anna University, Chennai palsonkennedyr@yahoo.co.in Kalyana Sundaram.N IT & Vels University hereiskalyan@yahoo.com
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationDistributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud
Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud R. H. Jadhav 1 P.E.S college of Engineering, Aurangabad, Maharashtra, India 1 rjadhav377@gmail.com ABSTRACT: Many
More informationLITERATURE SURVEY (BIG DATA ANALYTICS)!
LITERATURE SURVEY (BIG DATA ANALYTICS) Applications frequently require more resources than are available on an inexpensive machine. Many organizations find themselves with business processes that no longer
More informationA Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture
A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses
More informationData Quality for Web Log Data Using a Hadoop Environment
Data Quality for Web Log Data Using a Hadoop Environment 21 QISHAN YANG, Insight Centre for Data Analytics, School of Computing, Dublin City University, Dublin, Ireland (qishan.yang@insight-centre.org)
More informationUday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved
Implementation of K-Means Clustering Algorithm in Hadoop Framework Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India Abstract Drastic growth
More informationImproving Efficiency of Parallel Mining of Frequent Itemsets using Fidoop-hd
ISSN 2395-1621 Improving Efficiency of Parallel Mining of Frequent Itemsets using Fidoop-hd #1 Anjali Kadam, #2 Nilam Patil 1 mianjalikadam@gmail.com 2 snilampatil2012@gmail.com #12 Department of Computer
More informationIntroduction to MapReduce
732A54 Big Data Analytics Introduction to MapReduce Christoph Kessler IDA, Linköping University Towards Parallel Processing of Big-Data Big Data too large to be read+processed in reasonable time by 1 server
More informationRESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury
RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Presented by: Ahmed Elbagoury Outline Background & Motivation What is Restore? Types of Result Reuse System Architecture Experiments Conclusion Discussion Background
More informationEfficient Map Reduce Model with Hadoop Framework for Data Processing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
More informationDynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce
Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Shiori KURAZUMI, Tomoaki TSUMURA, Shoichi SAITO and Hiroshi MATSUO Nagoya Institute of Technology Gokiso, Showa, Nagoya, Aichi,
More informationA Database System Performance Study with Micro Benchmarks on a Many-core System
DEIM Forum 2012 D6-3 A Database System Performance Study with Micro Benchmarks on a Many-core System Fang XI Takeshi MISHIMA and Haruo YOKOTA Department of Computer Science, Graduate School of Information
More informationShark: Hive on Spark
Optional Reading (additional material) Shark: Hive on Spark Prajakta Kalmegh Duke University 1 What is Shark? Port of Apache Hive to run on Spark Compatible with existing Hive data, metastores, and queries
More informationThe amount of data increases every day Some numbers ( 2012):
1 The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect
More informationResults and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets
Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets Sheetal K. Labade Computer Engineering Dept., JSCOE, Hadapsar Pune, India Srinivasa Narasimha
More informationSDA: Software-Defined Accelerator for general-purpose big data analysis system
SDA: Software-Defined Accelerator for general-purpose big data analysis system Jian Ouyang(ouyangjian@baidu.com), Wei Qi, Yong Wang, Yichen Tu, Jing Wang, Bowen Jia Baidu is beyond a search engine Search
More information2/26/2017. The amount of data increases every day Some numbers ( 2012):
The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect to
More informationSeptember 2013 Alberto Abelló & Oscar Romero 1
duce-i duce-i September 2013 Alberto Abelló & Oscar Romero 1 Knowledge objectives 1. Enumerate several use cases of duce 2. Describe what the duce environment is 3. Explain 6 benefits of using duce 4.
More informationJignesh M. Patel. Blog:
Jignesh M. Patel Blog: http://bigfastdata.blogspot.com Go back to the design Query Cache from Processing for Conscious 98s Modern (at Algorithms Hardware least for Hash Joins) 995 24 2 Processor Processor
More informationI ++ Mapreduce: Incremental Mapreduce for Mining the Big Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org I ++ Mapreduce: Incremental Mapreduce for
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More information