Apache Hive Cookbook. Hanish Bansal Saurabh Chauhan Shrey Mehrotra BIRMINGHAM - MUMBAI

Similar documents
Selenium Testing Tools Cookbook

Selenium Testing Tools Cookbook

Learning Embedded Linux Using the Yocto Project

Android SQLite Essentials

TortoiseSVN 1.7. Beginner's Guide. Perform version control in the easiest way with the best SVN client TortoiseSVN.

HTML5 Games Development by Example

Big Data Architect.

PHP 5 e-commerce Development

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

App Inventor 2 Essentials

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Innovatus Technologies

Big Data Analytics using Apache Hadoop and Spark with Scala

Learning PrimeFaces Extensions Development

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Mastering FreeSWITCH

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Configuring and Deploying Hadoop Cluster Deployment Templates

Learning Drupal 6 Module Development

Hadoop. Introduction / Overview

Raspberry Pi Cookbook for Python Programmers

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018

Oracle GoldenGate for Big Data

Techno Expert Solutions An institute for specialized studies!

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka

Big Data Hadoop Stack

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Introduction to BigData, Hadoop:-

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Microsoft Big Data and Hadoop

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Hadoop. Introduction to BIGDATA and HADOOP

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

HDInsight > Hadoop. October 12, 2017

Big Data Hadoop Course Content

Stages of Data Processing

Oracle Big Data Fundamentals Ed 1

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Big Data Analytics. Description:

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Hadoop Development Introduction

BIG DATA COURSE CONTENT

Hadoop course content

Hadoop: The Definitive Guide PDF

APACHE SPARK 2 FOR BEGINNERS BY RAJANARAYANAN THOTTUVAIKKATUMANA DOWNLOAD EBOOK : APACHE SPARK 2 FOR BEGINNERS BY RAJANARAYANAN THOTTUVAIKKATUMANA PDF

Hadoop: The Definitive Guide

Big Data Development HADOOP Training - Workshop. FEB 12 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Data Architectures in Azure for Analytics & Big Data

Foundation Flash MX Applications

Learning Redis. Design efficient web and business solutions with Redis. Vinoo Das BIRMINGHAM - MUMBAI.

Hadoop Online Training

Big Data Hadoop Certification Training

Windows Server 2012 Automation with PowerShell Cookbook

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Instant Nginx Starter

Expert Lecture plan proposal Hadoop& itsapplication

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Hadoop An Overview. - Socrates CCDH

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

Hadoop, Yarn and Beyond

Hortonworks Data Platform

microsoft

A complete Hadoop Development Training Program.

@Pentaho #BigDataWebSeries

Hitachi Hyper Scale-Out Platform (HSP) Hortonworks Ambari VM Quick Reference Guide

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG

Deepak Vohra. Pro Docker

Read & Download (PDF Kindle) Pro Apache Hadoop

Oracle Big Data Fundamentals Ed 2

Product Compatibility Matrix

Talend Big Data Sandbox. Big Data Insights Cookbook

Hortonworks and The Internet of Things

Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Apache Spark 2 X Cookbook Cloud Ready Recipes For Analytics And Data Science

We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Apache Solr A Practical Approach To Enterprise Search

R Language for the SQL Server DBA

Certified Big Data and Hadoop Course Curriculum

Certified Big Data Hadoop and Spark Scala Course Curriculum

/smlcodes /smlcodes /smlcodes JIRA. Small Codes. Programming Simplified. A SmlCodes.Com Small presentation. In Association with Idleposts.

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

Top 25 Hadoop Admin Interview Questions and Answers

Big Data Infrastructure at Spotify

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

MapR Enterprise Hadoop

Summary 4. Sample RESS Page WURFL plus screen size detection Dave Olsen's Detector Pure JavaScript screen size test Utility functions Dave Olsen's

vsphere Design Best Practices

JAVA GENERICS AND COLLECTIONS EBOOK

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Transcription:

Apache Hive Cookbook Easy, hands-on recipes to help you understand Hive and its integration with frameworks that are used widely in today's big data world Hanish Bansal Saurabh Chauhan Shrey Mehrotra BIRMINGHAM - MUMBAI

Apache Hive Cookbook Copyright 2016 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: April 2016 Production reference: 1260416 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78216-108-0 www.packtpub.com

Credits Authors Hanish Bansal Saurabh Chauhan Shrey Mehrotra Reviewer Aristides Villarreal Bravo Commissioning Editor Wilson D'souza Acquisition Editor Tushar Gupta Content Development Editor Anish Dhurat Project Coordinator Bijal Patel Proofreader SaÞ s Editing Indexer Priya Sane Graphics Kirk D'Penha Production Coordinator Shantanu N. Zagade Cover Work Shantanu N. Zagade Technical Editor Vishal K. Mewada Copy Editor Dipti Mankame

About the Authors Hanish Bansal is a software engineer with over 4 years of experience in developing big data applications. He loves to study emerging solutions and applications mainly related to big data processing, NoSQL, natural language processing, and neural networks. He has worked on various technologies such as Spring Framework, Hibernate, Hadoop, Hive, Flume, Kafka, Storm, and NoSQL databases, which include HBase, Cassandra, MongoDB, and search engines such as Elasticsearch. In 2012, he completed his graduation in Information Technology stream from Jaipur Engineering College and Research Center, Jaipur, India. He was also the technical reviewer of the book Apache Zookeeper Essentials. In his spare time, he loves to travel and listen to music. You can read his blog at http://hanishblogger.blogspot.in/ and follow him on Twitter at https://twitter.com/hanishbansal786. I would like to thank my parents for their love, support, encouragement and the amazing chances they've given me over the years. Saurabh Chauhan is a module lead with close to 8 years of experience in data warehousing and big data applications. He has worked on multiple Extract, Transform and Load tools, such as Oracle Data Integrator and Informatica as well as on big data technologies such as Hadoop, Hive, Pig, Sqoop, and Flume. He completed his bachelor of technology in 2007 from Vishveshwarya Institute of Engineering and Technology. In his spare time, he loves to travel and discover new places. He also has a keen interest in sports. I would like to thank everyone who has supported me throughout my life.

Shrey Mehrotra has 6 years of IT experience and, since the past 4 years, in designing and architecting cloud and big data solutions for the governmental and Þ nancial domains. Having worked with big data R&D Labs and Global Data and Analytical Capabilities, he has gained insights into Hadoop, focusing on HDFS, MapReduce, and YARN. His technical strengths also include Hive, Pig, Spark, Elasticsearch, Sqoop, Flume, Kafka, and Java. He likes spending time performing R&D on different big data technologies. He is the coauthor of the book Learning YARN, a certiþ ed Hadoop developer, and has also written various technical papers. In his free time, he listens to music, watches movies, and spending time with friends. I would like to thank my mom and dad for giving me support to accomplish anything I wanted. Also, I would like to thank my friends, who bear with me while I am busy writing.

About the Reviewer Aristides Villarreal Bravo is a Java developers, a member of the NetBeans Dream Team, and a Java User Groups leader. He has organized and participated in various conferences and seminars related to Java, JavaEE, NetBeans, NetBeans Platform, free software, and mobile devices, nationally and internationally. He has written tutorials and blogs about Java, NetBeans, and web development. He has participated in several interviews on sites such as NetBeans, NetBeans Dzone, and JavaHispano. He has developed plugins for NetBeans. He has been a technical reviewer for the book PrimeFaces Blueprints. Aristides is the CEO of Javscaz Software Developers. He lives in Panamá To my mother, father, and all family and friends.

www.packtpub.com ebooks, discount offers, and more Did you know that Packt offers ebook versions of every book published, with PDF and epub Þ les available? You can upgrade to the ebook version at www.packtpub.com and as a print book customer, you are entitled to a discount on the ebook copy. Get in touch with us at customercare@packtpub.com for more details. At www.packtpub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and ebooks. TM https://www2.packtpub.com/books/subscription/packtlib Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books. Why Subscribe? Fully searchable across every book published by Packt Copy and paste, print, and bookmark content On demand and accessible via a web browser

Table of Contents Preface v Chapter 1: Developing Hive 1 Introduction 1 Deploying Hive on a Hadoop cluster 2 Deploying Hive Metastore 3 Installing Hive 6 ConÞ guring HCatalog 10 Understanding different components of Hive 11 Compiling Hive from source 13 Hive packages 15 Debugging Hive 16 Running Hive 17 Changing conþ gurations at runtime 18 Chapter 2: Services in Hive 19 Introducing HiveServer2 19 Understanding HiveServer2 properties 21 ConÞ guring HiveServer2 high availability 22 Using HiveServer2 Clients 24 Introducing the Hive metastore service 34 ConÞ guring high availability of metastore service 36 Introducing Hue 36 Chapter 3: Understanding the Hive Data Model 43 Introduction 43 Using numeric data types 45 Using string data types 46 Using Date/Time data types 47 Using miscellaneous data types 48 Using complex data types 48 i

Table of Contents Using operators 50 Partitioning 57 Partitioning a managed table 58 Partitioning an external table 65 Bucketing 65 Chapter 4: Hive Data DeÞ nition Language 69 Introduction 70 Creating a database schema 70 Dropping a database schema 72 Altering a database schema 73 Using a database schema 74 Showing database schemas 74 Describing a database schema 75 Creating tables 76 Dropping tables 78 Truncating tables 79 Renaming tables 80 Altering table properties 80 Creating views 81 Dropping views 82 Altering the view properties 83 Altering the view as select 83 Showing tables 84 Showing partitions 85 Show the table properties 85 Showing create table 86 HCatalog 87 WebHCat 88 Chapter 5: Hive Data Manipulation Language 89 Introduction 89 Loading Þ les into tables 90 Inserting data into Hive tables from queries 93 Inserting data into dynamic partitions 96 Writing data into Þ les from queries 98 Enabling transactions in Hive 99 Inserting values into tables from SQL 101 Updating data 104 Deleting data 105 ii