Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

Size: px
Start display at page:

Download "Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012"

Transcription

1 Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012

2 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema Relational Aggregation Operators Data Cube Roll Up, Drill Down Papers to Cover An Overview of Data Warehousing and OLAP Technology by Surajit Chaudhuri (Microsoft) and Umeshwar Dayal (HP Labs), in the proceedings of IEEE 1995 Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross Tab, and SubTotals by Jin Gray (Microsoft), et al, in the proceedings of IEEE 1996

3 Data Mining Introduction Overview of Data Mining Data Cleaning Process Main Concepts Association Rules Support and Confidence APRIORI Algorithm Optimization Techniques of Apriori Algorithm Example Pandora Book to Cover J. Han, M. Kamber (2001), Data Mining, Morgan Kaufmann Publishers, San Francisco, CA

4 Semistructured Data vs Structured Data Structured Data Relational Database, Data Warehouse SQL Semistructured Data XML, HTML XQuery, XPath Unstructured Data Text, Web Data (Mixed of Text, Image, Audio, Video) Problems and Solutions of Processing Semistructured/Unstructured Data Introduction of Mark Up Languages: XML, HTML Papers To Cover (to come)

5 Advanced XML Introduction of XML XML Schema, Semantics, Protocol XQuery, XPath XML Query Processing Papers To Cover XML 1.0(DOM and SAX2 APIs) XQuery 1.0 and XPath 2.0 Semantics XSLT

6 Information Retrieval: How Google Search Engine Works Web data Processing Building Relevancy Metric for Search Engine Example: Google Search Engine Papers to Cover( to come)

7 Map Reduce and Apache Hadoop Introduction of Google Map Reduce Introduction of Apache Hadoop HDFS Architecture Parallel programming and the MapReduce programming model Algorithms of Map Reduce/Apache Hadoop Papers to Cover MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean (Google) and Sanjay Ghemawat (Google) in the proceedings of OSDI 2004 //labs.google.com/papers/mapreduce-osdi04.pdf Lammal, Ralf. Google's MapReduce Programming Model Revisited. Open Source MapReduce Apache Hadoop in White Papers by Apache, Yahoo

8 Data Warehouse on Parallel Processing Architecture of Parallel Processing Flow of Data Processing in Data Warehouse on Parallel Processing Architecture - Retrieval - Aggregation Example: Teradata Parallel Architecture Oracle Parallel Architecture

9 Business Intelligence What is Business Intelligence?? Data Mining on Web Transaction Data of Business Upcoming Big Trend for Every Business for Current and Next Generation

10 Big Data: Web Data Processing Problems of Big Data Processing Solutions Processing Web Transaction Data(Unstructured Data) on Data Warehouse (Structured Data Processing Server)

11 Real World Examples Face Book Papers to Cover Data Warehousing and Analytics Infrastructure at Facebook by Ashish Thusoo (Facebook), et al, in the proceedings of Sigmod 2010 Ebay Data Warehousing and Analytics at Ebay

12 Data Warehouse that Supports Unstructured Web Data Distributed data warehouse that supports unstructured content and more aspects of extreme data ETL: External Transfer Language SQL-MapReduce SQL language extension using ETL and Stored Procedure that transfers Unstructured Data(Web Data) to Structured Data (Relation:Table) Processing the Converted Relational Database on 100% Parallel Shared Nothing Distributed System

13 Cloud Computing Introduction of Cloud Computing Public Cloud Private Cloud Hybrid Cloud Architecture 100% Shared Nothing Parallel System Example Amazon Cloud

14 Assignments -- Tentative 1. Practice on building Star Schema data warehouse 2. Run an existing data mining algorithm with various support and confidence thresholds to deduce the results and conclusions 3. Programming on Building and Parsing an XML document 4. Run a word count task on documents using Map Reduce/Apache Hadoop on a single node system.

Cleveland State University

Cleveland State University Cleveland State University CIS 612 Modern Database Programming & Big Data Processing (3-0-3) Fall 2014 Section 50 Class Nbr. 2670. Tues, Thur 4:00 5:15 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred.

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 612/CIS712 Big Data & Parallel Database Processing Systems (3-0-3) Prerequisites: CIS 530. CIS 611 Preferred. Instructor: Dr. Sunnie S. Chung Office Location: FH 222 Phone:

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 612/CIS712 Big Data & Parallel Database Processing Systems (3-0-3) Prerequisites: CIS 530. CIS 611 Preferred. Instructor: Dr. Sunnie S. Chung Office Location: FH 222 Phone:

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 611/711 Enterprise Databases and Data Warehouse (3-0-3) Prerequisites: CIS430/CIS 530 Instructor: Dr. Sunnie S. Chung Office Location: FH222 Phone: 216 687 4661 Email: sschung.cis@gmail.com

More information

CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung

CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung Research on Topics in Recent Computer Science Research and related papers in the subject that you choose and give presentations in class and

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

1. Introduction to MapReduce

1. Introduction to MapReduce Processing of massive data: MapReduce 1. Introduction to MapReduce 1 Origins: the Problem Google faced the problem of analyzing huge sets of data (order of petabytes) E.g. pagerank, web access logs, etc.

More information

Sql Fact Constellation Schema In Data Warehouse With Example

Sql Fact Constellation Schema In Data Warehouse With Example Sql Fact Constellation Schema In Data Warehouse With Example Data Warehouse OLAP - Learn Data Warehouse in simple and easy steps using Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP), Specialized SQL

More information

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,

More information

Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data

Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data Yojna Arora, Dinesh Goyal Abstract: Big Data refers to that huge amount of data which cannot be analyzed by using traditional analytics

More information

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Lectures for the course: Data Warehousing and Data Mining (IT 60107) Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo Evolution of Big Data Architectures@ Facebook Architecture Summit, Shenzhen, August 2012 Ashish Thusoo About Me Currently Co-founder/CEO of Qubole Ran the Data Infrastructure Team at Facebook till 2011

More information

Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL

Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL Sanjay Gandhi G 1, Dr.Balaji S 2 Associate Professor, Dept. of CSE, VISIT Engg College, Tadepalligudem, Scholar Bangalore

More information

After completing this course, participants will be able to:

After completing this course, participants will be able to: Designing a Business Intelligence Solution by Using Microsoft SQL Server 2008 T h i s f i v e - d a y i n s t r u c t o r - l e d c o u r s e p r o v i d e s i n - d e p t h k n o w l e d g e o n d e s

More information

Microsoft Analytics Platform System (APS)

Microsoft Analytics Platform System (APS) Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual

More information

Contents. Part I Setting the Scene

Contents. Part I Setting the Scene Contents Part I Setting the Scene 1 Introduction... 3 1.1 About Mobility Data... 3 1.1.1 Global Positioning System (GPS)... 5 1.1.2 Format of GPS Data... 6 1.1.3 Examples of Trajectory Datasets... 8 1.2

More information

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most

More information

Data Lake Based Systems that Work

Data Lake Based Systems that Work Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a

More information

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu

CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin Presented by: Suhua Wei Yong Yu Papers: MapReduce: Simplified Data Processing on Large Clusters 1 --Jeffrey Dean

More information

Knowledge Discovery in Databases. Databases. date name surname street city account no. payment balance

Knowledge Discovery in Databases. Databases. date name surname street city account no. payment balance Databases date name surname street city account no. payment balance 980103 Jan Novak Dlouha 5 Praha 1 9945371 100.00 100.00 980105 Jan Novak Dlouha 5 Praha 1 9945371 1500.00 1600.00 980106 Jan Novak Dlouha

More information

M. PHIL. COMPUTER SCIENCE (FT / PT) PROGRAMME (For the candidates to be admitted from the academic year onwards)

M. PHIL. COMPUTER SCIENCE (FT / PT) PROGRAMME (For the candidates to be admitted from the academic year onwards) BHARATHIDASAN UNIVERSITY TIRUCHIRAPPALLI 620 024 M. PHIL. COMPUTER SCIENCE (FT / PT) PROGRAMME (For the candidates to be admitted from the academic year 2007-2008 onwards) SEMESTER I COURSE TITLE MARKS

More information

BIG DATA TESTING: A UNIFIED VIEW

BIG DATA TESTING: A UNIFIED VIEW http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation

More information

Mastering Data Warehouse Aggregates Solutions For Star Schema Performance

Mastering Data Warehouse Aggregates Solutions For Star Schema Performance Mastering Data Warehouse Aggregates Solutions For Star Schema Performance Star Schema The Complete Reference Christopher Adamson Amazon. Mastering Data Warehouse Aggregates, Solutions for Star Schema Performance

More information

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved. Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources

More information

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype? Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic

More information

XML: Extensible Markup Language

XML: Extensible Markup Language XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified

More information

Data Analysis and Data Science

Data Analysis and Data Science Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical

More information

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG Prof R.Angelin Preethi #1 and Prof J.Elavarasi *2 # Department of Computer Science, Kamban College of Arts and Science for Women, TamilNadu,

More information

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Data Analytics at Logitech Snowflake + Tableau = #Winning

Data Analytics at Logitech Snowflake + Tableau = #Winning Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

Chapter 13 XML: Extensible Markup Language

Chapter 13 XML: Extensible Markup Language Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process. MTAT.03.183 Data Mining Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut. ee Acknowledgment This slide deck is a mashup of the following publicly available slide

More information

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access Map/Reduce vs. DBMS Sharma Chakravarthy Information Technology Laboratory Computer Science and Engineering Department The University of Texas at Arlington, Arlington, TX 76009 Email: sharma@cse.uta.edu

More information

New Technologies for Data Management

New Technologies for Data Management New Technologies for Data Management Chaitan Baru 2 2 Why new technologies? Big Data Characteristics: Volume, Velocity, Variety Began as a Volume problem E.g. Web crawls 1 spb-100 spb in a single cluster

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **

More information

Course no: CSC- 451 Full Marks: Credit hours: 3 Pass Marks: Nature of course: Theory (3 Hrs.) + Lab (3 Hrs.)

Course no: CSC- 451 Full Marks: Credit hours: 3 Pass Marks: Nature of course: Theory (3 Hrs.) + Lab (3 Hrs.) CSC-451 Data Warehousing and Data Mining Course no: CSC- 451 Full Marks: 60+20+20 Credit hours: 3 Pass Marks: 24+8+8 Nature of course: Theory (3 Hrs.) + Lab (3 Hrs.) Course Synopsis: Analysis of advanced

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Evolving To The Big Data Warehouse

Evolving To The Big Data Warehouse Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

Information Integration

Information Integration Chapter 11 Information Integration While there are many directions in which modern database systems are evolving, a large family of new applications fall undei the general heading of information integration.

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko Shark: SQL and Rich Analytics at Scale Michael Xueyuan Han Ronny Hajoon Ko What Are The Problems? Data volumes are expanding dramatically Why Is It Hard? Needs to scale out Managing hundreds of machines

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)

More information

Microsoft Big Data and Hadoop

Microsoft Big Data and Hadoop Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common

More information

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters. Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been

More information

New Approaches to Big Data Processing and Analytics

New Approaches to Big Data Processing and Analytics New Approaches to Big Data Processing and Analytics Contributing authors: David Floyer, David Vellante Original publication date: February 12, 2013 There are number of approaches to processing and analyzing

More information

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10 Scalable Web Programming CS193S - Jan Jannink - 2/25/10 Weekly Syllabus 1.Scalability: (Jan.) 2.Agile Practices 3.Ecology/Mashups 4.Browser/Client 7.Analytics 8.Cloud/Map-Reduce 9.Published APIs: (Mar.)*

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

Oracle Database 11g: Data Warehousing Fundamentals

Oracle Database 11g: Data Warehousing Fundamentals Oracle Database 11g: Data Warehousing Fundamentals Duration: 3 Days What you will learn This Oracle Database 11g: Data Warehousing Fundamentals training will teach you about the basic concepts of a data

More information

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY Journal of Analysis and Computation (JAC) (An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861 International Conference on Emerging Trends in IOT & Machine Learning, 2018 TOOLS

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:

More information

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis 1 NoSQL So-called NoSQL systems offer reduced functionalities compared to traditional Relational DBMS, with the aim of achieving

More information

A REVIEW DATA CUBE ANALYSIS METHOD IN BIG DATA ENVIRONMENT

A REVIEW DATA CUBE ANALYSIS METHOD IN BIG DATA ENVIRONMENT A REVIEW DATA CUBE ANALYSIS METHOD IN BIG DATA ENVIRONMENT Dewi Puspa Suhana Ghazali 1, Rohaya Latip 1, 2, Masnida Hussin 1 and Mohd Helmy Abd Wahab 3 1 Department of Communication Technology and Network,

More information

Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines

More information

Map Reduce Group Meeting

Map Reduce Group Meeting Map Reduce Group Meeting Yasmine Badr 10/07/2014 A lot of material in this presenta0on has been adopted from the original MapReduce paper in OSDI 2004 What is Map Reduce? Programming paradigm/model for

More information

A Review Paper on Big data & Hadoop

A Review Paper on Big data & Hadoop A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College

More information

SQL Server Analysis Services

SQL Server Analysis Services DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, SQL Server 2005 Analysis Services SQL Server 2005 Analysis Services - 1 Analysis Services Database and

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Adnan YAZICI Computer Engineering Department

Adnan YAZICI Computer Engineering Department Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection

More information

Chapter 6 VIDEO CASES

Chapter 6 VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Creating Connection With Hive. Version: 16.0

Creating Connection With Hive. Version: 16.0 Creating Connection With Hive Version: 16.0 Copyright 2015 Intellicus Technologies This document and its content is copyrighted material of Intellicus Technologies. The content may not be copied or derived

More information

Big Data The end of Data Warehousing?

Big Data The end of Data Warehousing? Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort

More information

Big Data Analytics. Rasoul Karimi

Big Data Analytics. Rasoul Karimi Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline

More information

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera, How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS

More information

FAST DATA RETRIEVAL USING MAP REDUCE: A CASE STUDY

FAST DATA RETRIEVAL USING MAP REDUCE: A CASE STUDY , pp-01-05 FAST DATA RETRIEVAL USING MAP REDUCE: A CASE STUDY Ravin Ahuja 1, Anindya Lahiri 2, Nitesh Jain 3, Aditya Gabrani 4 1 Corresponding Author PhD scholar with the Department of Computer Engineering,

More information

BI ENVIRONMENT PLANNING GUIDE

BI ENVIRONMENT PLANNING GUIDE BI ENVIRONMENT PLANNING GUIDE Business Intelligence can involve a number of technologies and foster many opportunities for improving your business. This document serves as a guideline for planning strategies

More information

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Keywords: Big Data, Oracle Big Data Appliance, Hadoop, NoSQL, Oracle

More information

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem. About the Tutorial Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and

More information

Cloud, Big Data & Linear Algebra

Cloud, Big Data & Linear Algebra Cloud, Big Data & Linear Algebra Shelly Garion IBM Research -- Haifa 2014 IBM Corporation What is Big Data? 2 Global Data Volume in Exabytes What is Big Data? 2005 2012 2017 3 Global Data Volume in Exabytes

More information

MapReduce: Algorithm Design for Relational Operations

MapReduce: Algorithm Design for Relational Operations MapReduce: Algorithm Design for Relational Operations Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec Projection π Projection in MapReduce Easy Map over tuples, emit

More information

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City The Complete Reference Christopher Adamson Mc Grauu LlLIJBB New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto Contents Acknowledgments

More information

Based on Big Data: Hype or Hallelujah? by Elena Baralis

Based on Big Data: Hype or Hallelujah? by Elena Baralis Based on Big Data: Hype or Hallelujah? by Elena Baralis http://dbdmg.polito.it/wordpress/wp-content/uploads/2010/12/bigdata_2015_2x.pdf 1 3 February 2010 Google detected flu outbreak two weeks ahead of

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate

More information

Big Data Facebook

Big Data Facebook Big Data Architectures@ Facebook QCon London 2012 Ashish Thusoo Outline Big Data @ Facebook - Scope & Scale Evolution of Big Data Architectures @ FB Past, Present and Future Questions Big Data @ FB: Scale

More information

APPLICATION OF HADOOP MAPREDUCE TECHNIQUE TOVIRTUAL DATABASE SYSTEM DESIGN. Neha Tiwari Rahul Pandita Nisha Chhatwani Divyakalpa Patil Prof. N.B.

APPLICATION OF HADOOP MAPREDUCE TECHNIQUE TOVIRTUAL DATABASE SYSTEM DESIGN. Neha Tiwari Rahul Pandita Nisha Chhatwani Divyakalpa Patil Prof. N.B. APPLICATION OF HADOOP MAPREDUCE TECHNIQUE TOVIRTUAL DATABASE SYSTEM DESIGN. Neha Tiwari Rahul Pandita Nisha Chhatwani Divyakalpa Patil Prof. N.B.Kadu PREC, Loni, India. ABSTRACT- Today in the world of

More information

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Information Technology IT6701 - INFORMATION MANAGEMENT Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV / VII Regulation: 2013

More information

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Blended Learning Outline: Cloudera Data Analyst Training (171219a) Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills

More information

Big Data in Data Warehouses

Big Data in Data Warehouses doi: 10.17234/INFUTURE.2015.27 Big Data in Data Warehouses Review Summary Tomislav Šubić, Patrizia Poščić, Danijela Jakšić Department of Informatics, University of Rijeka Radmile Matejčić 2, Rijeka, Croatia

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

SCHEME OF TEACHING AND EXAMINATION B.E. (ISE) VIII SEMESTER (ACADEMIC YEAR )

SCHEME OF TEACHING AND EXAMINATION B.E. (ISE) VIII SEMESTER (ACADEMIC YEAR ) SCHEME OF TEACHING AND EXAMINATION B.E. (ISE) VIII SEMESTER (ACADEMIC YEAR 2016-17) Sl Subject Code Subject Credits Hours/Week Examination Marks No Lecture Tutorial Practical CIE SEE Total 1 UIS00XX Elective

More information

Hadoop/MapReduce Computing Paradigm

Hadoop/MapReduce Computing Paradigm Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications

More information

Business Cockpit. Controlling the digital enterprise. Fabio Casati Hewlett-Packard

Business Cockpit. Controlling the digital enterprise. Fabio Casati Hewlett-Packard Business Cockpit Controlling the digital enterprise Fabio Casati Hewlett-Packard UC Berkeley, Oct 4, 2002 page 1 Managing Operational Systems Develop a platform for the semantic management of operational

More information

Parallel Techniques for Big Data. Patrick Valduriez

Parallel Techniques for Big Data. Patrick Valduriez Parallel Techniques for Big Data Patrick Valduriez 1 1 Outline of the Talk Big data: problem and issues Parallel data processing Parallel architectures Parallel techniques Cloud data mgt NoSQL DBMS MapReduce

More information