Using Hive for Data Warehousing

Similar documents
Using Hive for Data Warehousing

Using Hive for Data Warehousing

Accessing Hadoop Data Using Hive

Using Hive for Data Warehousing

Hands-on Lab Session 9909 Introduction to Application Performance Management: Monitoring. Timothy Burris, Cloud Adoption & Technical Enablement

Installing IBM InfoSphere BigInsights Quick Start Edition

Hands-on Lab Session 9011 Working with Node.js Apps in IBM Bluemix. Pam Geiger, Bluemix Enablement

IBM Software. IBM Forms V8.0. Forms Experience Builder - Portal Integration. Lab Exercise

IBM Infrastructure Suite for z/vm and Linux: Introduction IBM Tivoli OMEGAMON XE on z/vm and Linux

MapReduce & YARN Hands-on Lab Exercise 1 Simple MapReduce program in Java

Hands-on Lab Session 9020 Working with JSON Web Token. Budi Darmawan, Bluemix Enablement

MSS VSOC Portal Single Sign-On Using IBM id IBM Corporation

DISCLAIMER COPYRIGHT List of Trademarks

Getting Started with InfoSphere Streams Quick Start Edition (VMware)

IBM Workplace TM Collaboration Services

Reducing MIPS Using InfoSphere Optim Query Workload Tuner TDZ-2755A. Lloyd Matthews, U.S. Senate

BlueMix Hands-On Workshop

Log & Event Manager UPGRADE GUIDE. Version Last Updated: Thursday, May 25, 2017

InfoSphere Guardium 9.1 TechTalk Reporting 101

SCREEN COMBINATION FEATURE IN HATS 7.0

Lab DSE Designing User Experience Concepts in Multi-Stream Configuration Management

Upgrading the DOORS and Change integration data to the OSLC-CM integration

A Quick Look at IBM SmartCloud Monitoring. Author: Larry McWilliams, IBM Tivoli Integration of Competency Document Version 1, Update:

IBM Cognos Dynamic Query Analyzer Version Installation and Configuration Guide IBM

IBM Operational Decision Manager Version 8 Release 5. Configuring Operational Decision Manager on Java SE

Innovate 2013 Automated Mobile Testing

Lotusphere IBM Collaboration Solutions Development Lab

Empowering DBA's with IBM Data Studio. Deb Jenson, Data Studio Product Manager,

IBM Monitoring Agent for Citrix Virtual Desktop Infrastructure 7.2 FP3. User's Guide IBM SC

Lotus Team Workplace. Version Installation and Upgrade Guide G

IBM Networking OS. BBI Quick Guide. for the EN2092 1Gb Ethernet Scalable Switch, Second edition (replaces 88Y7949)

AD406: What s New in Digital Experience Development with IBM Web Experience Factory

WebSphere Commerce Developer Professional

Symantec Ghost Solution Suite Web Console - Getting Started Guide

WebSphere Commerce Professional

IBM Cognos Dynamic Query Analyzer Version Installation and Configuration Guide IBM

IBM InfoSphere Guardium

InfoSphere Warehouse with Power Systems and EMC CLARiiON Storage: Reference Architecture Summary

Version 1.2 Tivoli Integrated Portal 2.2. Tivoli Integrated Portal Customization guide

IBM DB Getting started with Data Studio Hands-On Lab. Information Management Cloud Computing Center of Competence.

Veritas NetBackup Backup, Archive, and Restore Getting Started Guide. Release 8.1.2

UPGRADE GUIDE. Log & Event Manager. Version 6.4

Enterprise Vault.cloud CloudLink Google Account Synchronization Guide. CloudLink to 4.0.3

Cloudera Manager Quick Start Guide

Central Administration Console Installation and User's Guide

WebSphere Partner Gateway v6.2.x: EDI TO XML Transformation With FA

Lotus Learning Management System R1

Version 9 Release 0. IBM i2 Analyst's Notebook Premium Configuration IBM

NTP Software File Auditor for Windows Edition

IBM Worklight V5.0.6 Getting Started

Version 9 Release 0. IBM i2 Analyst's Notebook Configuration IBM

IBM i Version 7.2. Connecting to your system Connecting to Your system with IBM Navigator for i IBM

IBM Security Guardium Cloud Deployment Guide IBM SoftLayer


Client Installation and User's Guide

JMP to LSAF Add-in. User Guide v1.1

Version 11 Release 0 May 31, IBM Interact - GDPR IBM

Veritas System Recovery 18 Management Solution Administrator's Guide

Oracle Cloud Using Oracle Big Data Manager. Release

IBM Maximo Spatial Asset Management Version 7 Release 5. Installation Guide

Partner Integration Portal (PIP) Installation Guide

TIBCO Jaspersoft running in AWS accessing a back office Oracle database via JDBC with Progress DataDirect Cloud.

TM1 9.5 Quick Installation and Upgrade Guide. Nature of Document: Tip or Technique Product(s): TM1 9.5 Area of Interest: Upgrade/Migration

Release Notes ================ InfoSphere Guardium. Release: 9.1. Version InfoSphere Guardium v9.0, patch 200. Fix Completion Date:

IBM SPSS Text Analytics for Surveys

DefendX Software Control-Audit for Hitachi Installation Guide

IBM SPSS Statistics Desktop

IBM Decision Server Insights. Installation Guide. Version 8 Release 6

TME 10 Reporter Release Notes

IBM Virtual Machine Manager Installation and User's Guide

WebSphere Commerce Developer Professional 9.0

FRM FOR OUTLOOK PLUGIN INSTALLATION GUIDE FRM Solutions, Inc.

Processing Big Data with Hadoop in Azure HDInsight

Veritas System Recovery 16 Management Solution Readme

Oracle Cloud Using Oracle Big Data Manager. Release

Oracle Big Data Manager User s Guide. For Oracle Big Data Appliance

Server Installation Guide

IBM OpenPages GRC Migration Tools Version x to 7.0

Extended Search Administration

IBM i2 Analyze ibase Connector Deployment Guide. Version 4 Release 1 IBM

Installing the SAP Solution Manager integration package with IBM Business Process Manager V8.0

Oracle Enterprise Manager. 1 Introduction. System Monitoring Plug-in for Oracle Enterprise Manager Ops Center Guide 11g Release 1 (

Version 11 Release 0 May 31, IBM Contact Optimization Installation Guide IBM

IBM Rational DOORS Installing and Using the RQM Interface Release 9.2

NetBackup Self Service Release Notes

Oracle Database Express Edition

Veritas System Recovery 16 Management Solution Administrator's Guide

Dell Storage Compellent Integration Tools for VMware

IBM Operational Decision Manager Version 8 Release 5. Configuring Operational Decision Manager on WebLogic

Desktop Installation Guide

Veritas Desktop and Laptop Option 9.2. Disaster Recovery Scenarios

Client Installation and User's Guide

IBM Maximo Calibration Version 7 Release 5. Installation Guide

IBM Fault Analyzer for z/os

MANAGING ANDROID DEVICES: VMWARE WORKSPACE ONE OPERATIONAL TUTORIAL VMware Workspace ONE

Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2

IBM Maximo Anywhere Version 7 Release 6. Planning, installation, and deployment IBM

Oracle Fusion Middleware

Migrating vrealize Automation 6.2 to 7.1

Installing Nagios Log Server with VMware Workstation Player

Transcription:

An IBM Proof of Technology Using Hive for Data Warehousing Unit 1: Exploring Hive

An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Contents LAB 1 EXPLORING HIVE... 4 1.1 GETTING STARTED... 5 1.2 HIVE AND THE WEB CONSOLE... 9 1.2.1 STARTING/STOPPING HIVE FROM THE BIGINSIGHTS WEB CONSOLE... 9 1.2.2 HIVE WEB INTERFACE... 10 1.3 EXPLORING THE HIVE ENVIRONMENT... 11 1.3.1 INVESTIGATING HIVE DIRECTORY STRUCTURE WITH THE CONSOLE... 11 1.3.2 EXPLORING THE HIVE COMMAND LINE INTERFACE (CLI)... 12 1.4 SUMMARY... 15 Contents Page 3

Lab 1 Exploring Hive The overwhelming trend towards digital services, combined with cheap storage, has generated massive amounts of data that enterprises need to effectively gather, process, and analyze. Data analysis techniques from the data warehouse and high-performance computing communities are invaluable for many enterprises, however often times their cost or complexity of scale-up discourages the accumulation of data without an immediate need. As valuable knowledge may nevertheless be buried in this data, related scaled-up technologies have been developed. Examples include Google s MapReduce, and the open-source implementation, Apache Hadoop. Writing MapReduce programs to analyze your Big Data can get complex. Apache Hive can help make querying your data much easier. Hive, first created at Facebook, is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. After completing this hands-on lab, you will be able to: Start and stop Hive from both the command line and the BigInsights Web Console. Use the Linux command line to explore the Hive directory structure. Interact with the Hive CLI in interactive mode, one-shot mode, and via a file. Allow 30 minutes to 45 minutes to complete this section of lab. This version of the lab was designed using the InfoSphere BigInsights 2.1 Quick Start Edition. Throughout this lab you will be using the following account login information: Username Password VM image setup screen root password Linux biadmin biadmin Page 4 Using Hive

1.1 Getting Started To prepare for the contents of this lab, you must go through the process of getting all of the Hadoop components started. These instructions assume you have already followed the IBM InfoSphere BigInsights Quick Start Edition, v2.1 README setup guide. 1. Start the VMware image by clicking the Play virtual machine button in the VMware Player if it is not already on. 2. Log in to the VMware virtual machine using the following credentials. User: biadmin Password: biadmin Hands-on-Lab Page 5

3. After you log in, your screen should look similar to the one below. Before we can start working with Hive and the Hadoop Distributed File system, we must first start all the BigInsights components. There are two ways of doing this, through terminal and through simply doubleclicking an icon. Both of these methods will be shown in the following steps. 4. Now open the terminal by double clicking the BigInsights Shell icon. 5. Click on the Terminal icon. Page 6 Using Hive

6. Once the terminal has been opened change to the $BIGINSIGHTS_HOME/bin directory (which by default is /opt/ibm/biginsights) cd $BIGINSIGHTS_HOME/bin or cd /opt/ibm/biginsights/bin 7. Start the Hadoop components (daemons) on the BigInsights server. You can practice starting all components with these commands. Please note that they will take a few minutes to run../start-all.sh 8. Sometimes certain Hadoop components may fail to start. You can start and stop the failed components one at a time by using start.sh and stop.sh respectively. For example to start and stop Hive use:./start.sh hive./stop.sh hive Hands-on-Lab Page 7

Notice that since Hive did not initially fail, the terminal is telling us that Hive is already running. 9. Once all components have started successfully you may move on. 10. If you would like to stop all components execute the command below. However, for this lab please leave all components started../stop-all.sh Next, let us look at how you would start all the components by double-clicking an icon. 11. Double-clicking on the Start BigInsights icon would execute a script that does the above mentioned steps. Once all components are started the terminal exits and you are set. Simple. 12. We can stop the components in a similar manner, by double-clicking on the Stop Biginsights icon. (To the right of Start BigInsights icon) Now that the components are started you may move on to the next section. Page 8 Using Hive

1.2 Hive and the Web Console Hive can also be started and stopped very easily from the BigInsights Web Console. Additionally we can work with Hive from the Hive web interface that is packaged with Apache Hive. 1.2.1 Starting/Stopping Hive from the BigInsights Web Console 1. Start the Web Console by double-clicking on the BigInsights WebConsole icon. 2. Once logged in, click on the Cluster Status tab at the top of the page. 3. Click on the Hive service and note the detailed information provided for this service in the pane at right. From here, you can start or stop the Hive service depending on your needs. For example, you can see the URL for Hive's Web interface and its process ID. 4. In the pane to the right (which displays the Hive status), click the red Stop button to stop the service 5. When prompted to confirm that you want to stop the Hive service, click OK and wait for the operation to complete. The right pane should appear similar to the following image Hands-on-Lab Page 9

6. Restart the Hive service by clicking on the green arrow just beneath the Hive Status heading. (See the previous figure.) When the operation completes, the Web console will indicate that Hive is running again, likely under a process ID that differs from the earlier Hive process ID shown at the beginning of this lab module. (You may need to use the Refresh button of your Web browser to reload information displayed in the left pane.) 1.2.2 Hive Web Interface 1. Cut-and-paste the URL for Hive s Web interface (http://bivm:9999/hwi) into a new tab of your browser. You'll see the open source Hive Web Interface provided with Hive for administration purposes, as shown below. Page 10 Using Hive

1.3 Exploring the Hive environment 1.3.1 Investigating Hive directory structure with the console Let s navigate to the Hive home directory on the Linux file system and investigate the directories that Hive is comprised of. 1. Open the Linux terminal by double clicking the BigInsights Shell icon on the desktop. 2. Click on the Terminal icon 3. In the terminal change to the Hive home directory $ cd $HIVE_HOME Note: This is equivalent to $ cd $BIGINSIGHTS_HOME/hive 4. Check out the current directory $ pwd You are now in the /opt/ibm/biginsights/hive directory. This is where Hive is setup on this BigInsights virtual machine. 5. Explore the directory structure inside the hive folder by running the ls command. $ ls Hands-on-Lab Page 11

6. You will notice the following directories bin executables to start/stop/configure/check status of hive lib server s JAR files conf Hive environment, metastore, security, and log configuration files docs Hive documentation scripts scripts for upgrading derby and MySQL metastores from one version of Hive to the next examples Hive examples src Hive source and test scripts 1.3.2 Exploring the Hive Command Line Interface (CLI) From the Hive CLI shell you can perform queries, DML, DDL and more. We will be doing much work in the Hive CLI so let s briefly check it out! 1. In the Linux terminal change into the $HIVE_HOME/bin directory $ cd $HIVE_HOME/bin 2. Inside the bin directory we will run a command that will show us the command line options for the Hive CLI. $./hive -help -service cli Page 12 Using Hive

3. Page through the environment variables already set in the Hive CLI. $./hive S e set more Hands-on-Lab Page 13

4. Execute a hive one shot command (the -e designates this) to show the current schemas in the system. You should see that only the default schema (schema and database are equivalent terminology in Hive) is listed. $./hive S e SHOW SCHEMAS; Note that the S in the above command stands for Silent mode and removes some inessential output. 5. Create a new file in the /tmp directory with a simple HQL command inside of it. $ echo SHOW DATABASES; > /tmp/myfile.hql 6. Tell Hive to run the commands in your file by passing the -f option. $./hive -f /tmp/myfile.hql Note the output Hive lists only a single database the default Hive database. 7. Start an interactive Hive shell session. $./hive Page 14 Using Hive

8. Run the SHOW DATABASES statement from within the interactive Hive session. hive> SHOW DATABASES; 9. Quit Hive. hive> quit; 1.4 Summary Congratulations! You now know how to start and stop Hive using the terminal and the BigInsights Web Console. You can navigate to the Hive directories and understand the contents of those directories. You also know how to interact with the Hive CLI. You may move on to the next Unit. Hands-on-Lab Page 15

NOTES

NOTES

Copyright IBM Corporation 2013. The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, these materials. Nothing contained in these materials is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in these materials to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. This information is based on current IBM product plans and strategy, which are subject to change by IBM without notice. Product release dates and/or capabilities referenced in these materials may change at any time at IBM s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml.