Facebook data extraction using R & process in Data Lake

Size: px
Start display at page:

Download "Facebook data extraction using R & process in Data Lake"

Transcription

1 Facebook data extraction using R & process in Data Lake An approach to understand how retail companie B s y G c a a ut n am p Go e sw rf a o m r i m Facebook data mining to analyze customers behavioral pattern By OnlineGuwahati Team

2 Abstract Without visibility on social media, it's extremely tough for the brands to interact with below ground influencers who molds the future. Now a day's almost 94 per cent of buying decisions are based on exponential growth of user participation on social media (mainly on Facebook). Facebook is playing a critical role to increase brand awareness. Businesses need to transform digital native customers into brand advocates and this can only be done if the relationship has been nurtured. The key for brands is to encourage consumers to endorse the brand and play a real part within the business. Facebook data mining is becoming a major factor in taking accurate business decisions by analysis of user posts, comments, likes, shares etc. as well as the sentiment analysis on business page. To analyze data, we should have proper mechanism to extract data from the business page. With effective utilization of R, a programming language for statistical analysis and Hadoop Distributed File Systems (HDFS) with ELT (Extraction, Loading and Transformation) approach, this E-Book has describe in details how we can perform Facebook data mining. Part I Facebook application creation and R installation Before creating a Facebook application, we need to have a fair knowledge on Facebook platform. A set of application programming interfaces (API) and tools have been developed by Facebook for the third party developers so that they can create applications to leverage and interact with core Facebook features. Entire set of API and tools are as a whole denoted as Facebook platform. Besides, following are the high level components consolidated in Facebook platform. - Graph API can be utilized by application developer to read from and write data to Facebook. Graph API provides an overview of the Facebook social graph and the relationships between entities therein. - Authentication allows applications to interact with Facebook and users to sign on to various applications through Facebook via a PC, mobile phone or desktop app. - Social plug-in like "Like button by which application developers are allowed to give their users a social experience through Facebook without gaining access to Facebook users' information.

3 - Open graph protocol which can be utilized by application developers to integrate their pages with Facebook. - IFrames can be used to create applications those can be accessed via Facebook login, but are hosted separately from Facebook. - Microformats, is a component that allows Facebook users to move these details to their own calendars or to mapping applications. Facebook application creation Facebook application is a small application which is developed for the Facebook profiles. As a first step we need to have an account with Facebook. Login to developers.facebook.com if already has an account. After successful login, the dropdown box at the right hand top corner will change to My Apps. On extending it, we will have option to add a new application as shown in picture below.

4 Now we can create a new application by adding few inputs. Applications are already built internally by Facebook. Based on our category as well as display name selection, an ID would be generated and assign to it. Since we are going to extract page data, so category selection should be of type "Apps for the page" from the dropdown box. After clicking on "Create App Id ", new App ID will be generated with options to execute multiple operations on it. To get all the mandatory information about the application, we need to navigate through "Dashboard" that appears on left side below the application name. App ID and App Secret are mandatory and sensitive information and hence need to be copied in safe file to avoid disclosure to others. Now we need to proceed towards "Settings" panel to add platform that would be "Website".

5 We are almost done with the application creation except Website Site-URL input. Leave the browser without sign-out from developers.facebook.com. Next Step is R installation and R-Studio setup. R installation and R-Studio setup R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of platforms viz. UNIX, Windows and MacOS. R encompasses effective data handling and storage facility with a suit of operators for calculations on arrays, in particular matrices. Eventually it is a simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities. In order to download R, we can choose preferred CRAN mirror. Here is the URL of the mirror If we choose operating system as Windows, it will appear as R win.exe after download. Below picture shows how R console looks like after successful installation.

6 RStudio is an integrated development environment (IDE) for R. It makes R easier to be used. RStudio is a consolidated platform that includes code editor, debugging and visualization tool. Besides, a console, syntax-highlighting that supports direct code execution, tools for plotting, history and workspace management are key components in RStudio. RStudio is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux) or in a browser connected to RStudio Server or RStudio Server Pro (Debian/Ubuntu, RedHat/CentOS, and SUSE Linux). As a beginner/ learner, you can go for open source and get it from Note:- Without proper installation of R, RStudio doesn t work, because RStudio runs on top R environment. We can correlate this with Eclipse studio where Java runtime should be installed first to run it.

7 Above picture shows how RStudio looks in Windows environment. Even though several basic packages come by default with RStudio, we need to Install Rfacebook package from CRAN separately. Additional packages can be installed by navigating top menu "Tools" -> install Packages. Here we need to have "Rfacebook". After successful installation, it's mandatory to check whether all required packages are available or not. Those are visible in the right side below and here is the list of packages.

8 Finally we can see a message in RStudio editor. Part 2 Generate and assign OAuth token to Facebook R session. R internally makes a call to the Facebook API to get all the relevant information but call/invocation to the Facebook API has to be authenticated first. That means, before releasing any data from its repository, Facebook API verifies whether methods in the API gets invoked from trusted source. Once we generate OAuth token using App ID and App Secret as explained in Part 1, R internally use it during the session to invoke various methods on Facebook API. Following are the steps for OAuth Token creation, which will subsequently be assigned to R session. Start the RStudio and load the library "Rfacebook"

9 Get the library into RStudio editor by > library ("Rfacebook") Pass the value of App ID and App Secret to the method fboauth as parameter and returned value should store in a local variable say og_oauth. Value of App ID and App Secret are already noted in Part 1. After entering the above command, a site URL gets generated as that we need to add into Facebook application created in Part 1 and click "Save changes" button on Facebook application page.

10 After that, press any key or Enter in the RStudio editor. Immediately a new browser tab will be opened with Facebook page to get the password again. Once entered, we see a message as Authentication complete. Please close this page and return to R." on the page. We can save the variable "og_oauth" as a file to be re-used in future sessions, which in fact can be used as token in functions by the following command in RStudio editor For testing, we can extract information for the logged in user viz. name of the user, total Likes etc. To execute that, we need to invoke getuser() method where User Token of the created Facebook application has to be passed. To access User Token, we need to navigate the Tools & Support menu in developer.facebook.com where application is created.

11 Copy the User Token and add as parameter in the RStudio editor. "myself" is a local variable/object here which getuser() method returns if we type "myself$name" in editor, Name of the logged in user gets displayed. As the connection between Facebook API & R is established now, we are ready to extract the data from any company s facebook page for analysis. Part 3 Extraction of post/data from the public facebook page Here we are considering electronic commerce company's pages for precisely analyzing customer's feedback, sentiments and other information. In real time business a group of customers or other interested individuals give the companies a convenient way to keep the members informed about products and services and share information. Besides, posts, comments in groups in Facebook page help companies to understand customers /buyers expectations. To attract customers from rival companies and retain customers with same company, companies need to enhance/improve products/service in an innovative way referring to the critics, complaints posted on rival company Facebook page.

12 Once those posts/information are extracted and analyzed, it will be very easy to decide on the additional steps to be performed, those the rival companies have overlooked The data what we have extracted from Facebook is unstructured and it can't be processed in traditional RDBMDS (Databases like Oracle, MySQL, IBM's DB2 etc.) as well as mining too to understand customer's behaviors' etc. In a similar ways we can consolidate huge volume of Facebook data from the companies pages. Besides, twitter streaming data is another source where we can analyze customer's sentiments, their views etc. The data produced in the twitter streaming is semi-structured that is JSON (Java script Object Notation) that is light weight compare to XML Part 4 Process data by adopting ELT in a distributed manner on Data Lake. The basic concept of Data Lake where we can use the approach ELT (Extraction, Loading and then Transformation) against traditional ETL (Extraction, Transformation and then loading) process. ETL process implies to traditional data warehousing system where structured data format follows (raw and column). The main characteristic of a Data Lake is that data is not

13 classified when gets persisted and due to that the data preparation, cleansing and subsequently transformation activities are being eliminated. In Data Warehouse, these activities consume major time of whole process. By storing Facebook's comments, reviews, shares, 'Like' etc. in its rawest form enables business analyst of the organizations specific to retail domain with multichannel e-commerce as a separate vertical to find answers from those data for which they do not know the questions yet. By leveraging HDFS (Hadoop Distributed File System), we can develop Data Lake to store any format data in order to process and analysis. Directly data can be loaded in the Lake without transformation, later transformation can be performed on demand basis. There are few components available using those we can ingest data (independent of any Format) into lake like Flume, Sqoop, efficient file transfer mechanism etc. Node Node Node Node Node HDFS Node

14 Since we are not going to analyze run time streaming data, we need to follow batch processing approach where data first gets persisted in the lake. As a next step, all the junk, invalid, noisy data should be removed and that can be implemented using multiple Map-Reduce chain which in turn also detonated as data pipeline. Data quality is a major challenge in the data lake implementation even though it makes possible to store all the data, ask complex and radically bigger business questions, and find out hidden patterns and relationships from the data. After successful data crunching in an optimized way, responsibilities goes to data scientist to perform any ad-hoc queries, and build an advanced model at any time iteratively. By adopting machine learning technique, we can start to explore data, as well as to build predictive models for customer's interest in buying products. If our objective is to build a model that can predict the products which customer is buying then proceed with supervised approach. Unsupervised approach like clustering or PCA will likely mix the information that customers are interested in with other, unrelated, products and generate a worse predictor. By levering supervised machine learning techniques, accumulating all available historical information can predict trends and effects of seasonality for planning and forecasting. On top of it, natural language processing (NLP) can effectively utilized to analyze consumer sentiments and apply those as inputs to the planning and forecasting process to organize the products in the channel of "Related Products" or "You may like also" in the E-Commerce portal. The retailer can penetrate insight to their customer as well as visitors by performing forecasting and planning through machine learning on those extracted Facebook data which in turns helps to improve performance and profitability. Applying automated complex task such as NLP on Facebook's unstructured massive data sets collection, an actionable intelligence can be developed to achieve better understanding of their customers. In addition to that, information achieved from the datasets collection by machine learning can be leveraged to channelize the business and operational reshuffle to improve business decisions. In this e-book, we have just discussed the approach how to analyze the Facebook's data in Data Lake using machine learning. So it can't be taken as cookbook to implement in real scenarios. There are large numbers of technical steps with complete description and configuration involves which we have not covered here. This e- book is presented to get the flavor of Facebook's unstructured data mining using R and Hadoop (A popular Big Data processing framework )

Data Analytics at Logitech Snowflake + Tableau = #Winning

Data Analytics at Logitech Snowflake + Tableau = #Winning Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief

More information

Overview of Data Services and Streaming Data Solution with Azure

Overview of Data Services and Streaming Data Solution with Azure Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Talend Big Data Sandbox. Big Data Insights Cookbook

Talend Big Data Sandbox. Big Data Insights Cookbook Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

Integrating SAS Analytics into Your Web Page

Integrating SAS Analytics into Your Web Page Paper SAS2145-2018 Integrating SAS Analytics into Your Web Page James Kochuba and David Hare, SAS Institute Inc. ABSTRACT SAS Viya adds enhancements to the SAS Platform that include the ability to access

More information

@Pentaho #BigDataWebSeries

@Pentaho #BigDataWebSeries Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of

More information

Chapter 6 VIDEO CASES

Chapter 6 VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

SugarCRM for Hootsuite

SugarCRM for Hootsuite SugarCRM for Hootsuite User Guide Document izeno Pte Ltd Version 1.0 Hootsuite for Sugar User Guide P a g e 1 Revision / Modifications SN Version Date Author Comments 1 0.1 Wed 20 December 2017 Kris Haryadi

More information

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński Introduction into Big Data analytics Lecture 3 Hadoop ecosystem Janusz Szwabiński Outlook of today s talk Apache Hadoop Project Common use cases Getting started with Hadoop Single node cluster Further

More information

Microsoft Exam

Microsoft Exam Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred

More information

POWER BI BOOTCAMP. COURSE INCLUDES: 4-days of instructor led discussion, Hands-on Office labs and ebook.

POWER BI BOOTCAMP. COURSE INCLUDES: 4-days of instructor led discussion, Hands-on Office labs and ebook. Course Code : AUDIENCE : FORMAT: LENGTH: POWER BI BOOTCAMP O365-412-PBID (CP PBD365) Professional Developers Instructor-led training with hands-on labs 4 Days COURSE INCLUDES: 4-days of instructor led

More information

Hadoop Online Training

Hadoop Online Training Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the

More information

Capture Business Opportunities from Systems of Record and Systems of Innovation

Capture Business Opportunities from Systems of Record and Systems of Innovation Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information

More information

Create-A-Page Design Documentation

Create-A-Page Design Documentation Create-A-Page Design Documentation Group 9 C r e a t e - A - P a g e This document contains a description of all development tools utilized by Create-A-Page, as well as sequence diagrams, the entity-relationship

More information

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide FAQs 1. What is the browser compatibility for logging into the TCS Connected Intelligence Data Lake for Business Portal? Please check whether you are using Mozilla Firefox 18 or above and Google Chrome

More information

Designing your BI Architecture

Designing your BI Architecture IBM Software Group Designing your BI Architecture Data Movement and Transformation David Cope EDW Architect Asia Pacific 2007 IBM Corporation DataStage and DWE SQW Complex Files SQL Scripts ERP ETL Engine

More information

Talend Big Data Sandbox. Big Data Insights Cookbook

Talend Big Data Sandbox. Big Data Insights Cookbook Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is

More information

BEAWebLogic. Portal. Overview

BEAWebLogic. Portal. Overview BEAWebLogic Portal Overview Version 10.2 Revised: February 2008 Contents About the BEA WebLogic Portal Documentation Introduction to WebLogic Portal Portal Concepts.........................................................2-2

More information

Přehled novinek v SQL Server 2016

Přehled novinek v SQL Server 2016 Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing

More information

A Review Paper on Big data & Hadoop

A Review Paper on Big data & Hadoop A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems 1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for

More information

QLIKVIEW ARCHITECTURAL OVERVIEW

QLIKVIEW ARCHITECTURAL OVERVIEW QLIKVIEW ARCHITECTURAL OVERVIEW A QlikView Technology White Paper Published: October, 2010 qlikview.com Table of Contents Making Sense of the QlikView Platform 3 Most BI Software Is Built on Old Technology

More information

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data

More information

Data in the Cloud and Analytics in the Lake

Data in the Cloud and Analytics in the Lake Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)

More information

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can

More information

Welcome to the Investor Experience

Welcome to the Investor Experience Welcome to the Investor Experience Welcome to the Black Diamond Investor Experience, a platform that allows advisors to customize how they present information to their clients. This document provides important

More information

Boost your Analytics with ML for SQL Nerds

Boost your Analytics with ML for SQL Nerds Boost your Analytics with ML for SQL Nerds SQL Saturday Spokane Mar 10, 2018 Julie Koesmarno @MsSQLGirl mssqlgirl.com jukoesma@microsoft.com Principal Program Manager in Business Analytics for SQL Products

More information

SAS Data Integration Studio 3.3. User s Guide

SAS Data Integration Studio 3.3. User s Guide SAS Data Integration Studio 3.3 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS Data Integration Studio 3.3: User s Guide. Cary, NC: SAS Institute

More information

SQL Server 2017 Power your entire data estate from on-premises to cloud

SQL Server 2017 Power your entire data estate from on-premises to cloud SQL Server 2017 Power your entire data estate from on-premises to cloud PREMIER SPONSOR GOLD SPONSORS SILVER SPONSORS BRONZE SPONSORS SUPPORTERS Vulnerabilities (2010-2016) Power your entire data estate

More information

PROCE55 Mobile: Web API App. Web API. https://www.rijksmuseum.nl/api/...

PROCE55 Mobile: Web API App. Web API. https://www.rijksmuseum.nl/api/... PROCE55 Mobile: Web API App PROCE55 Mobile with Test Web API App Web API App Example This example shows how to access a typical Web API using your mobile phone via Internet. The returned data is in JSON

More information

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

SQL Server Machine Learning Marek Chmel & Vladimir Muzny SQL Server Machine Learning Marek Chmel & Vladimir Muzny @VladimirMuzny & @MarekChmel MCTs, MVPs, MCSEs Data Enthusiasts! vladimir@datascienceteam.cz marek@datascienceteam.cz Session Agenda Machine learning

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

What's New in ActiveVOS 7.1 Includes ActiveVOS 7.1.1

What's New in ActiveVOS 7.1 Includes ActiveVOS 7.1.1 What's New in ActiveVOS 7.1 Includes ActiveVOS 7.1.1 2010 Active Endpoints Inc. ActiveVOS is a trademark of Active Endpoints, Inc. All other company and product names are the property of their respective

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

SAS IT Resource Management 3.8: Reporting Guide

SAS IT Resource Management 3.8: Reporting Guide SAS IT Resource Management 3.8: Reporting Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. SAS IT Resource Management 3.8: Reporting Guide.

More information

WebCenter Interaction 10gR3 Overview

WebCenter Interaction 10gR3 Overview WebCenter Interaction 10gR3 Overview Brian C. Harrison Product Management WebCenter Interaction and Related Products Summary of Key Points AquaLogic Interaction portal has been renamed

More information

iway Big Data Integrator New Features Bulletin and Release Notes

iway Big Data Integrator New Features Bulletin and Release Notes iway Big Data Integrator New Features Bulletin and Release Notes Version 1.5.2 DN3502232.0717 Active Technologies, EDA, EDA/SQL, FIDEL, FOCUS, Information Builders, the Information Builders logo, iway,

More information

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG Prof R.Angelin Preethi #1 and Prof J.Elavarasi *2 # Department of Computer Science, Kamban College of Arts and Science for Women, TamilNadu,

More information

Expose Existing z Systems Assets as APIs to extend your Customer Reach

Expose Existing z Systems Assets as APIs to extend your Customer Reach Expose Existing z Systems Assets as APIs to extend your Customer Reach Unlocking mainframe assets for mobile and cloud applications Asit Dan z Services API Management, Chief Architect asit@us.ibm.com Insert

More information

MicroStrategy Academic Program

MicroStrategy Academic Program MicroStrategy Academic Program Creating a center of excellence for enterprise analytics and mobility. HOW TO DEPLOY ENTERPRISE ANALYTICS AND MOBILITY ON AWS APPROXIMATE TIME NEEDED: 1 HOUR In this workshop,

More information

IBM C Rational Functional Tester for Java. Download Full Version :

IBM C Rational Functional Tester for Java. Download Full Version : IBM C2140-842 Rational Functional Tester for Java Download Full Version : http://killexams.com/pass4sure/exam-detail/c2140-842 QUESTION: 44 Which statement is true about the Time Delayed method when you

More information

Khadija Souissi. Auf z Systems November IBM z Systems Mainframe Event 2016

Khadija Souissi. Auf z Systems November IBM z Systems Mainframe Event 2016 Khadija Souissi Auf z Systems 07. 08. November 2016 @ IBM z Systems Mainframe Event 2016 Acknowledgements Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation.

More information

Integration Service. Admin Console User Guide. On-Premises

Integration Service. Admin Console User Guide. On-Premises Kony Fabric Integration Service Admin Console User Guide On-Premises Release V8 SP1 Document Relevance and Accuracy This document is considered relevant to the Release stated on this title page and the

More information

R Language for the SQL Server DBA

R Language for the SQL Server DBA R Language for the SQL Server DBA Beginning with R Ing. Eduardo Castro, PhD, Principal Data Analyst Architect, LP Consulting Moderated By: Jose Rolando Guay Paz Thank You microsoft.com idera.com attunity.com

More information

Logi Ad Hoc Reporting Management Console Usage Guide

Logi Ad Hoc Reporting Management Console Usage Guide Logi Ad Hoc Reporting Management Console Usage Guide Version 12.1 July 2016 Page 2 Contents Introduction... 5 Target Audience... 5 System Requirements... 6 Components... 6 Supported Reporting Databases...

More information

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality? Oliver Engels & Tillmann Eitelberg Big Data! Big Quality? Like to visit Germany? PASS Camp 2017 Main Camp 5.12 7.12.2017 (4.12 Kick Off Evening) Lufthansa Training & Conference Center, Seeheim SQL Konferenz

More information

Oracle Service Cloud. Release 18D. What s New

Oracle Service Cloud. Release 18D. What s New Oracle Service Cloud Release 18D What s New TABLE OF CONTENTS Revision History 3 Overview 3 Feature Summary 3 Agent Browser Channels 4 Chat Transfer Enhancements 4 Agent Browser Workspaces 5 Link and Unlink

More information

SAS 9.4 Intelligence Platform: Overview, Second Edition

SAS 9.4 Intelligence Platform: Overview, Second Edition SAS 9.4 Intelligence Platform: Overview, Second Edition SAS Documentation September 19, 2017 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. SAS 9.4 Intelligence

More information

Working with Database Connections. Version: 18.1

Working with Database Connections. Version: 18.1 Working with Database Connections Version: 18.1 Copyright 2018 Intellicus Technologies This document and its content is copyrighted material of Intellicus Technologies. The content may not be copied or

More information

Introduction to Big-Data

Introduction to Big-Data Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,

More information

WebSphere Puts Business In Motion. Put People In Motion With Mobile Apps

WebSphere Puts Business In Motion. Put People In Motion With Mobile Apps WebSphere Puts Business In Motion Put People In Motion With Mobile Apps Use Mobile Apps To Create New Revenue Opportunities A clothing store increases sales through personalized offers Customers can scan

More information

6/29/ :38 AM 1

6/29/ :38 AM 1 6/29/2017 11:38 AM 1 Creating an Event Hub In this lab, you will create an Event Hub. What you need for this lab An Azure Subscription Create an event hub Take the following steps to create an event hub

More information

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 3. Foundations of Business Intelligence: Databases and Information Management Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional

More information

Live Guide Co-browsing

Live Guide Co-browsing TECHNICAL PAPER Live Guide Co-browsing Netop develops and sells software solutions that enable swift, secure and seamless transfer of video, screens, sounds and data between two or more computers over

More information

8.0 Help for End Users About Jive for SharePoint System Requirements Using Jive for SharePoint... 6

8.0 Help for End Users About Jive for SharePoint System Requirements Using Jive for SharePoint... 6 for SharePoint 2010/2013 Contents 2 Contents 8.0 Help for End Users... 3 About Jive for SharePoint... 4 System Requirements... 5 Using Jive for SharePoint... 6 Overview of Jive for SharePoint... 6 Accessing

More information

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Spotfire: Brisbane Breakfast & Learn. Thursday, 9 November 2017

Spotfire: Brisbane Breakfast & Learn. Thursday, 9 November 2017 Spotfire: Brisbane Breakfast & Learn Thursday, 9 November 2017 CONFIDENTIALITY The following information is confidential information of TIBCO Software Inc. Use, duplication, transmission, or republication

More information

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Demo Introduction Keywords: Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Goal of Demo: Oracle Big Data Preparation Cloud Services can ingest data from various

More information

If you re a Facebook marketer, you re likely always looking for ways to

If you re a Facebook marketer, you re likely always looking for ways to Chapter 1: Custom Apps for Fan Page Timelines In This Chapter Using apps for Facebook marketing Extending the Facebook experience Discovering iframes, Application Pages, and Canvas Pages Finding out what

More information

Safelayer's Adaptive Authentication: Increased security through context information

Safelayer's Adaptive Authentication: Increased security through context information 1 Safelayer's Adaptive Authentication: Increased security through context information The password continues to be the most widely used credential, although awareness is growing that it provides insufficient

More information

RECSM Summer School: Scraping the web. github.com/pablobarbera/big-data-upf

RECSM Summer School: Scraping the web. github.com/pablobarbera/big-data-upf RECSM Summer School: Scraping the web Pablo Barberá School of International Relations University of Southern California pablobarbera.com Networked Democracy Lab www.netdem.org Course website: github.com/pablobarbera/big-data-upf

More information

Active Endpoints. ActiveVOS Platform Architecture Active Endpoints

Active Endpoints. ActiveVOS Platform Architecture Active Endpoints Active Endpoints ActiveVOS Platform Architecture ActiveVOS Unique process automation platforms to develop, integrate, and deploy business process applications quickly User Experience Easy to learn, use

More information

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET SOLUTION SHEET Syncsort DMX-h Simplifying Big Data Integration Goals of the Modern Data Architecture Data warehouses and mainframes are mainstays of traditional data architectures and still play a vital

More information

WHAT IS THE CONFIGURATION TROUBLESHOOTER?

WHAT IS THE CONFIGURATION TROUBLESHOOTER? Paper 302-2008 Best Practices for SAS Business Intelligence Administrators: Using the Configuration Troubleshooter to Keep SAS Solutions and SAS BI Applications Running Smoothly Tanya Kalich, SAS Institute

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks

More information

Solving Mobile App Development Challenges. Andrew Leggett & Abram Darnutzer CM First

Solving Mobile App Development Challenges. Andrew Leggett & Abram Darnutzer CM First Solving Mobile App Development Challenges Andrew Leggett & Abram Darnutzer CM First CM First WebClient Solutions CM WebClient Full desktop experience in browser CM WebClient Mobile Online mobile solution,

More information

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

The Now Platform Reference Guide

The Now Platform Reference Guide The Now Platform Reference Guide A tour of key features and functionality START Introducing the Now Platform Digitize your business with intelligent apps The Now Platform is an application Platform-as-a-Service

More information

How Do I Inspect Error Logs in Warehouse Builder?

How Do I Inspect Error Logs in Warehouse Builder? 10 How Do I Inspect Error Logs in Warehouse Builder? Scenario While working with Warehouse Builder, the designers need to access log files and check on different types of errors. This case study outlines

More information

Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC

Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC 2018 Storage Developer Conference. Dell EMC. All Rights Reserved. 1 Data Center

More information

Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration WHITE PAPER / JANUARY 25, 2019 Table of Contents Introduction... 3 Harnessing the power of big data beyond the SQL world...

More information

IBM Endpoint Manager Version 9.0. Software Distribution User's Guide

IBM Endpoint Manager Version 9.0. Software Distribution User's Guide IBM Endpoint Manager Version 9.0 Software Distribution User's Guide IBM Endpoint Manager Version 9.0 Software Distribution User's Guide Note Before using this information and the product it supports,

More information

Power BI Developer Bootcamp

Power BI Developer Bootcamp Power BI Developer Bootcamp Mastering the Power BI Development Platform Course Code Audience Format Length Course Description Student Prerequisites PBD365 Professional Developers In-person and Remote 4

More information

Product Documentation. ER/Studio Portal. User Guide. Version Published February 21, 2012

Product Documentation. ER/Studio Portal. User Guide. Version Published February 21, 2012 Product Documentation ER/Studio Portal User Guide Version 1.6.3 Published February 21, 2012 2012 Embarcadero Technologies, Inc. Embarcadero, the Embarcadero Technologies logos, and all other Embarcadero

More information

Sharp Social. Natural Language Understanding

Sharp Social. Natural Language Understanding Sharp Social Natural Language Understanding Step 1 Go to the URL https://console.ng.bluemix.net/ and press enter. A new window appears of IBM Bluemix which asks you to sign up and create a Bluemix account.

More information

<Insert Picture Here> Introduction to Big Data Technology

<Insert Picture Here> Introduction to Big Data Technology Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive

More information

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Keywords: Big Data, Oracle Big Data Appliance, Hadoop, NoSQL, Oracle

More information

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training:: Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional

More information

USER MANUAL. SalesPort Salesforce Customer Portal for WordPress (Lightning Mode) TABLE OF CONTENTS. Version: 3.1.0

USER MANUAL. SalesPort Salesforce Customer Portal for WordPress (Lightning Mode) TABLE OF CONTENTS. Version: 3.1.0 USER MANUAL TABLE OF CONTENTS Introduction...1 Benefits of Customer Portal...1 Prerequisites...1 Installation...2 Salesforce App Installation... 2 Salesforce Lightning... 2 WordPress Manual Plug-in installation...

More information

CUSTOMER PORTAL. Connectors Guide

CUSTOMER PORTAL. Connectors Guide CUSTOMER PORTAL Connectors Guide Connectors Clicking into this area will display connectors that can be linked to the portal. Once linked to the portal certain connectors will display information in the

More information

Certified Data Science with Python Professional VS-1442

Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional Certified Data Science with Python Professional Certification Code VS-1442 Data science has become

More information

Integration Service. Admin Console User Guide. On-Premises

Integration Service. Admin Console User Guide. On-Premises Kony MobileFabric TM Integration Service Admin Console User Guide On-Premises Release 7.3 Document Relevance and Accuracy This document is considered relevant to the Release stated on this title page and

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 12 Tutorial 3 Part 1 Twitter API In this tutorial, we will learn

More information

Election Analysis and Prediction Using Big Data Analytics

Election Analysis and Prediction Using Big Data Analytics Election Analysis and Prediction Using Big Data Analytics Omkar Sawant, Chintaman Taral, Roopak Garbhe Students, Department Of Information Technology Vidyalankar Institute of Technology, Mumbai, India

More information

WEBSITE INSTRUCTIONS. Table of Contents

WEBSITE INSTRUCTIONS. Table of Contents WEBSITE INSTRUCTIONS Table of Contents 1. How to edit your website 2. Kigo Plugin 2.1. Initial Setup 2.2. Data sync 2.3. General 2.4. Property & Search Settings 2.5. Slideshow 2.6. Take me live 2.7. Advanced

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing

More information

BUILD BETTER MICROSOFT SQL SERVER SOLUTIONS Sales Conversation Card

BUILD BETTER MICROSOFT SQL SERVER SOLUTIONS Sales Conversation Card OVERVIEW SALES OPPORTUNITY Lenovo Database Solutions for Microsoft SQL Server bring together the right mix of hardware infrastructure, software, and services to optimize a wide range of data warehouse

More information

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may

More information

Multi-Sponsor Environment. SAS Clinical Trial Data Transparency User Guide

Multi-Sponsor Environment. SAS Clinical Trial Data Transparency User Guide Multi-Sponsor Environment SAS Clinical Trial Data Transparency User Guide Version 6.0 01 December 2017 Contents Contents 1 Overview...1 2 Setting up Your Account...3 2.1 Completing the Initial Email and

More information

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud Microsoft Azure Databricks for data engineering Building production data pipelines with Apache Spark in the cloud Azure Databricks As companies continue to set their sights on making data-driven decisions

More information