Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

Similar documents
Cleveland State University

Cleveland State University

Cleveland State University

Cleveland State University

CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung

Big Data Hadoop Stack

1. Introduction to MapReduce

Sql Fact Constellation Schema In Data Warehouse With Example

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Data Warehouse and Data Mining

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Data-Intensive Distributed Computing

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo

Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL

After completing this course, participants will be able to:

Microsoft Analytics Platform System (APS)

Contents. Part I Setting the Scene

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Data Lake Based Systems that Work

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu

Knowledge Discovery in Databases. Databases. date name surname street city account no. payment balance

M. PHIL. COMPUTER SCIENCE (FT / PT) PROGRAMME (For the candidates to be admitted from the academic year onwards)

BIG DATA TESTING: A UNIFIED VIEW

Mastering Data Warehouse Aggregates Solutions For Star Schema Performance

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Data Warehouse and Data Mining

XML: Extensible Markup Language

Data Analysis and Data Science

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG

ETL and OLAP Systems

Data Analytics at Logitech Snowflake + Tableau = #Winning

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Chapter 13 XML: Extensible Markup Language

Safe Harbor Statement

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access

New Technologies for Data Management

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Course no: CSC- 451 Full Marks: Credit hours: 3 Pass Marks: Nature of course: Theory (3 Hrs.) + Lab (3 Hrs.)

Stages of Data Processing

Evolving To The Big Data Warehouse

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Information Integration

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Microsoft Big Data and Hadoop

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

New Approaches to Big Data Processing and Analytics

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

Oracle Big Data Fundamentals Ed 2

Oracle Database 11g: Data Warehousing Fundamentals

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY

Dealing with Data Especially Big Data

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

A REVIEW DATA CUBE ANALYSIS METHOD IN BIG DATA ENVIRONMENT

Oracle GoldenGate for Big Data

Map Reduce Group Meeting

A Review Paper on Big data & Hadoop

SQL Server Analysis Services

Embedded Technosolutions

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Adnan YAZICI Computer Engineering Department

Chapter 6 VIDEO CASES

Creating Connection With Hive. Version: 16.0

Big Data The end of Data Warehousing?

Big Data Analytics. Rasoul Karimi

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

FAST DATA RETRIEVAL USING MAP REDUCE: A CASE STUDY

BI ENVIRONMENT PLANNING GUIDE

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.

Cloud, Big Data & Linear Algebra

MapReduce: Algorithm Design for Relational Operations

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City

Based on Big Data: Hype or Hallelujah? by Elena Baralis

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Data Informatics. Seon Ho Kim, Ph.D.

Big Data Facebook

APPLICATION OF HADOOP MAPREDUCE TECHNIQUE TOVIRTUAL DATABASE SYSTEM DESIGN. Neha Tiwari Rahul Pandita Nisha Chhatwani Divyakalpa Patil Prof. N.B.

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Big Data in Data Warehouses

Modern Data Warehouse The New Approach to Azure BI

SCHEME OF TEACHING AND EXAMINATION B.E. (ISE) VIII SEMESTER (ACADEMIC YEAR )

Hadoop/MapReduce Computing Paradigm

Business Cockpit. Controlling the digital enterprise. Fabio Casati Hewlett-Packard

Parallel Techniques for Big Data. Patrick Valduriez

Transcription:

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012

Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema Relational Aggregation Operators Data Cube Roll Up, Drill Down Papers to Cover An Overview of Data Warehousing and OLAP Technology by Surajit Chaudhuri (Microsoft) and Umeshwar Dayal (HP Labs), in the proceedings of IEEE 1995 Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross Tab, and SubTotals by Jin Gray (Microsoft), et al, in the proceedings of IEEE 1996

Data Mining Introduction Overview of Data Mining Data Cleaning Process Main Concepts Association Rules Support and Confidence APRIORI Algorithm Optimization Techniques of Apriori Algorithm Example Pandora Book to Cover J. Han, M. Kamber (2001), Data Mining, Morgan Kaufmann Publishers, San Francisco, CA

Semistructured Data vs Structured Data Structured Data Relational Database, Data Warehouse SQL Semistructured Data XML, HTML XQuery, XPath Unstructured Data Text, Web Data (Mixed of Text, Image, Audio, Video) Problems and Solutions of Processing Semistructured/Unstructured Data Introduction of Mark Up Languages: XML, HTML Papers To Cover (to come)

Advanced XML Introduction of XML XML Schema, Semantics, Protocol XQuery, XPath XML Query Processing Papers To Cover XML 1.0(DOM and SAX2 APIs) XQuery 1.0 and XPath 2.0 Semantics XSLT

Information Retrieval: How Google Search Engine Works Web data Processing Building Relevancy Metric for Search Engine Example: Google Search Engine Papers to Cover( to come)

Map Reduce and Apache Hadoop Introduction of Google Map Reduce Introduction of Apache Hadoop HDFS Architecture Parallel programming and the MapReduce programming model Algorithms of Map Reduce/Apache Hadoop Papers to Cover MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean (Google) and Sanjay Ghemawat (Google) in the proceedings of OSDI 2004 //labs.google.com/papers/mapreduce-osdi04.pdf Lammal, Ralf. Google's MapReduce Programming Model Revisited. http://www.cs.vu.nl/~ralf/mapreduce/paper.pdf Open Source MapReduce http://lucene.apache.org/hadoop/ Apache Hadoop in White Papers by Apache, Yahoo

Data Warehouse on Parallel Processing Architecture of Parallel Processing Flow of Data Processing in Data Warehouse on Parallel Processing Architecture - Retrieval - Aggregation Example: Teradata Parallel Architecture Oracle Parallel Architecture

Business Intelligence What is Business Intelligence?? Data Mining on Web Transaction Data of Business Upcoming Big Trend for Every Business for Current and Next Generation

Big Data: Web Data Processing Problems of Big Data Processing Solutions Processing Web Transaction Data(Unstructured Data) on Data Warehouse (Structured Data Processing Server)

Real World Examples Face Book Papers to Cover Data Warehousing and Analytics Infrastructure at Facebook by Ashish Thusoo (Facebook), et al, in the proceedings of Sigmod 2010 Ebay Data Warehousing and Analytics at Ebay

Data Warehouse that Supports Unstructured Web Data Distributed data warehouse that supports unstructured content and more aspects of extreme data ETL: External Transfer Language SQL-MapReduce SQL language extension using ETL and Stored Procedure that transfers Unstructured Data(Web Data) to Structured Data (Relation:Table) Processing the Converted Relational Database on 100% Parallel Shared Nothing Distributed System

Cloud Computing Introduction of Cloud Computing Public Cloud Private Cloud Hybrid Cloud Architecture 100% Shared Nothing Parallel System Example Amazon Cloud

Assignments -- Tentative 1. Practice on building Star Schema data warehouse 2. Run an existing data mining algorithm with various support and confidence thresholds to deduce the results and conclusions 3. Programming on Building and Parsing an XML document 4. Run a word count task on documents using Map Reduce/Apache Hadoop on a single node system.