Based on Big Data: Hype or Hallelujah? by Elena Baralis

Similar documents
Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

BIG DATA TESTING: A UNIFIED VIEW

Big Data - Some Words BIG DATA 8/31/2017. Introduction

Introduction to Data Mining and Data Analytics

Introduction to Big Data

D B M G Data Base and Data Mining Group of Politecnico di Torino

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

BlueDBM: An Appliance for Big Data Analytics*

Challenges for Data Driven Systems

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

CISC 7610 Lecture 2b The beginnings of NoSQL

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY

SpagoBI and Talend jointly support Big Data scenarios

Big Data and IoT. Baris Aksanli 02/10/2016

BIG DATA & HADOOP: A Survey

Data mining fundamentals

Big Data A Growing Technology

Some Big Data Challenges

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition

<Insert Picture Here> Introduction to Big Data Technology

PMINJ Chapter 01 May Symposium Cloud Computing and Big Data. What s the Big Deal?

Data Mining Concepts & Tasks

Wearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life

SLIPO. Scalable Linking and Integration of Big POI data. Giorgos Giannopoulos IMIS/Athena RC

DIONYSUS: Towards Query-aware Distributed Processing of RDF Graph Streams

Broad Learning via Fusion of Heterogeneous Information

I am a Data Nerd and so are YOU!

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

Review - Relational Model Concepts

Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

Dealing with Data Especially Big Data

MATH36032 Problem Solving by Computer. Data Science

745: Advanced Database Systems

Visual Analytics Sandbox: A big data platform for processing network traffic

Big Data The end of Data Warehousing?

Chapter 6 VIDEO CASES

An Overview of Smart Sustainable Cities and the Role of Information and Communication Technologies (ICTs)

Fall 2017 ECEN Special Topics in Data Mining and Analysis

DATA MINING II - 1DL460

AT&T Labs Research Bell Labs/Lucent Technologies Princeton University Rensselaer Polytechnic Institute Rutgers, the State University of New Jersey

Accelerating Digital Transformation with InterSystems IRIS and vsan

Hadoop, Yarn and Beyond

Networking Cyber-physical Applications in a Data-centric World

Big data. Professor Dan Ariely, Duke University.

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

Lecture 25 Overview. Last Lecture Query optimisation/query execution strategies

Big Data with Hadoop Ecosystem

An introduction to Big Data. Presentation by Devesh Sharma, Zubair Asghar & Andreas Aalsaunet

CSE 626: Data mining. Instructor: Sargur N. Srihari. Phone: , ext. 113

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

A data-driven framework for archiving and exploring social media data

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING

Flash Storage Complementing a Data Lake for Real-Time Insight

Data Analytics at Logitech Snowflake + Tableau = #Winning

Embedded Technosolutions

An Implementation of Fog Computing Attributes in an IoT Environment

IoT and Edge Computing. Satyam Vaghani VP & GM, IOT & AI

TESTING BIG DATA WORLD RIGA. by Konstantin Pletenev OCTOBER, 2017, TAPOST GROW CONFIDENTLY

SMART CITIES AND BIG DATA: CHALLENGES AND OPPORTUNITIES

RethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu

Challenges and Opportunities with Big Data. By: Rohit Ranjan

Data Analytics Framework and Methodology for WhatsApp Chats

Online Bill Processing System for Public Sectors in Big Data

Fully Converged Cloud Storage

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

Webinar Series TMIP VISION

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

New Challenges in Big Data: Technical Perspectives. Hwanjo Yu POSTECH

Intelligent Edge Computing and ML-based Traffic Classifier. Kwihoon Kim, Minsuk Kim (ETRI) April 25.

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Introduction to Data Science

Decentralized Distributed Storage System for Big Data

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Cisco Tetration Analytics Platform: A Dive into Blazing Fast Deep Storage

Computer-based Tracking Protocols: Improving Communication between Databases

An Emerging Trend of Big data for High Volume and Varieties of Data to Search of Agricultural Data

BIG DATA ANALYTICS IN HEALTHCARE: CHALLENGES, TOOLS AND TECHNIQUES : A SURVEY

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

Decision models for the Digital Economy

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Hadoop/MapReduce Computing Paradigm

Keys to Your Web Presence Checklist

DT-ICT : Big data solutions for energy

Semantic Technologies for the Internet of Things: Challenges and Opportunities

Introduction to MapReduce (cont.)

EUROSTAT and BIG DATA. High Level Seminar on integrating non traditional data sources in the National Statistical Systems

Introduction to the Mathematics of Big Data. Philippe B. Laval

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Massive Data Analysis

Event Detection through Differential Pattern Mining in Internet of Things

3 Data, Data Mining. Chengkai Li

A Review Paper On Data Mining with Big data analysis algorithms, Tools, Applications and Challenges

Modern Database Concepts

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391

Transcription:

Based on Big Data: Hype or Hallelujah? by Elena Baralis http://dbdmg.polito.it/wordpress/wp-content/uploads/2010/12/bigdata_2015_2x.pdf 1

3 February 2010 Google detected flu outbreak two weeks ahead of CDC data (Centers for Disease Control and Prevention U.S.A) Based on the analysis of Google search queries 4 2

Internet live stats http://www.internetlivestats.com/ 5 User Generated Content (Web & Mobile) E.g., Facebook, Instagram, Yelp, TripAdvisor, Twitter, YouTube Health and scientific computing 6 3

Log files Web server log files, machine system log files Internet Of Things (IoT) Sensor networks, RFID, smart meters 7 Crowdsourcing Map data Computing Sensing Real time traffic info 8 4

Many different definitions Data whose scale, diversity and complexity require new architectures, techniques, algorithms and analytics to manage it and extract value and hidden knowledge from it 9 Many different definitions Data whose scale, diversity and complexity require new architectures, techniques, algorithms and analytics to manage it and extract value and hidden knowledge from it 10 5

Many different definitions Data whose scale, diversity and complexity require new architectures, techniques, algorithms and analytics to manage it and extract value and hidden knowledge from it 11 The 3Vs of big data Volume: scale of data Variety: different forms of data Velocity: analysis of streaming data but also Veracity: uncertainty of data Value: exploit information provided by data 12 6

Volume Data volume increases exponentially over time 44x increase from 2009 to 2020 Digital data 35 ZB in 2020 13 Variety Various formats, types and structures Numerical data, image data, audio, video, text, time series A single application may generate many different formats Heterogeneous data Complex data integration problem 14 7

Velocity Fast data generation rate Streaming data Very fast data processing to ensure timeliness 15 Veracity Data quality 16 8

Value Translate data into business advantage 17 Generation Acquisition Storage Analysis Generation Passive recording Typically structured data Bank trading transactions, shopping records, government sector archives Active generation Semistructured or unstructured data User-generated content, e.g., social networks Automatic production Location-aware, context-dependent, highly mobile data Sensor-based Internet-enabled devices 18 9

Generation Acquisition Storage Analysis Acquisition Collection Pull-based, e.g., web crawler Push-based, e.g., video surveillance, click stream Transmission Transfer to data center over high capacity links Preprocessing Integration, cleaning, redundancy elimination 19 Generation Acquisition Storage Analysis Storage Storage infrastructure Storage technology, e.g., HDD, SSD Networking architecture, e.g., DAS, NAS, SAN Data management File systems (HDFS), key-value stores (Memcached), column-oriented databases (Cassandra), document databases (MongoDB) Programming models Map reduce, stream processing, graph processing 20 10

Generation Acquisition Storage Analysis Analysis Objectives Descriptive analytics, predictive analytics, prescriptive analytics Methods Statistical analysis, data mining, text mining, network and graph data mining Clustering, classification and regression, association analysis Diverse domains call for customized techniques 21 Technology and infrastructure New architectures, programming paradigms and techniques are needed Data management and analysis New emphasis on data Data science 22 11

Processors process data Hard drives store data We need to transfer data from the disk to the processor 23 Transfer the processing power to the data Multiple distributed disks Each one holding a portion of a large dataset Process in parallel different file portions from different disks 24 12