Challenges and Opportunities with Big Data. By: Rohit Ranjan

Similar documents
REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor:2.114

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Some Big Data Challenges

COMP 465 Special Topics: Data Mining

Massively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data

Where do these data come from? What technologies do they use?? Whatever they use, they need models (schemas, metadata, )

Big Data A Growing Technology

McKesson mixes SSDs with HDDs for Optimal Performance and ROI. Bob Fine, Dir., Product Marketing

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Big Data - Some Words BIG DATA 8/31/2017. Introduction

DATA MINING II - 1DL460

Introduction to Data Science

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

BIG DATA & HADOOP: A Survey

Flash Storage Complementing a Data Lake for Real-Time Insight

Metadata, Chief technicolor

Based on Big Data: Hype or Hallelujah? by Elena Baralis

Data Mining Course Overview

Enterprise Data Management in an In-Memory World

The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

I am a Data Nerd and so are YOU!

MATH36032 Problem Solving by Computer. Data Science

Introduction to the Mathematics of Big Data. Philippe B. Laval

BIG DATA TESTING: A UNIFIED VIEW

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

data-based banking customer analytics

The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases

An Emergence Techniques In Big Data Mining

Roadmap for Enterprise System SSD Adoption

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

Active Archive and the State of the Industry

ETL is No Longer King, Long Live SDD

White Paper. The Evolution of RBAC Models to Next-Generation ABAC: An Executive Summary

Embedded Technosolutions

Big Data The New Era of Data

Strong Security Elements for IoT Manufacturing

Optimized Data Integration for the MSO Market

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

Boundary control : Access Controls: An access control mechanism processes users request for resources in three steps: Identification:

An Emerging Trend of Big data for High Volume and Varieties of Data to Search of Agricultural Data

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

pnfs and Linux: Working Towards a Heterogeneous Future

An introductory look. cloud computing in education

MOHA: Many-Task Computing Framework on Hadoop

Cisco Application Centric Infrastructure and Microsoft SCVMM and Azure Pack

A Review Paper on Big data & Hadoop

PHP Composer 9 Benefits of Using a Binary Repository Manager

BI/DWH Test specifics

Privacy Challenges in Big Data and Industry 4.0

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications. Tutorial at CVPR 2014 June 23rd, 1:00pm-5:00pm, Columbus, OH

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Digital Object Architecture

Copyright 2012 EMC Corporation. All rights reserved.

International Journal of Advance Engineering and Research Development WEATHER SENSING DATA RECOGNIZATION USING HADOOP FRAMEWORK

Introduction to Data Mining and Data Analytics

New Approach to Unstructured Data

Starting small to go Big: Building a Living Database

Top 4 considerations for choosing a converged infrastructure for private clouds

BUYING SERVER HARDWARE FOR A SCALABLE VIRTUAL INFRASTRUCTURE

How a Federated Identity Service Turns Identity into a Business Enabler, Not an IT Bottleneck

Big Data and IoT. Baris Aksanli 02/10/2016

Data Mining & Data Warehouse

DIONYSUS: Towards Query-aware Distributed Processing of RDF Graph Streams

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

2015 INFORMATION TECHNOLOGY RESEARCH REVIEW

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Modernizing Healthcare IT for the Data-driven Cognitive Era Storage and Software-Defined Infrastructure

Remarks to the HCI&IM-Sponsored National Workshop on Information Integration. Workshop Deliverables: Roadmap, Hard Problems, and Report

WHAT CIOs NEED TO KNOW TO CAPITALIZE ON HYBRID CLOUD

Dta Mining and Data Warehousing

Composable Infrastructure: Treat Your Infrastructure as Code

The Business Case for Data Warehouse

Middleware, Ten Years In: Vapority into Reality into Virtuality

Building a Data Strategy for a Digital World

DATA WAREHOUSING APPLICATIONS IN AYURVEDA THERAPY

SQL Server New innovations. Ivan Kosyakov. Technical Architect, Ph.D., Microsoft Technology Center, New York

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

V Conclusions. V.1 Related work

Big Data Integration BIG DATA 9/15/2017. Business Performance

Investing in a Better Storage Environment:

Toward a Memory-centric Architecture

Modern Tools Solve Ancient Riddle

IP Core Expertise From WireIE

Unstructured Text in Big Data The Elephant in the Room

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

Taming the Data Deluge With IBM Information Infrastructure The smart movement and management of information capacity growth without complexity

An Introduction to GPFS

Personal Grid. 1 Introduction. Zhiwei Xu, Lijuan Xiao, and Xingwu Liu

Advanced Digital Storage Solutions

2018 Edition. Security and Compliance for Office 365

NPP & Blockchain Have you thought about the data? Ken Krupa, CTO, MarkLogic

Transforming the Data Center into the Information Center. Jack Domme Chief Executive Officer Hitachi Data Systems

Research Data Management & Preservation: A Library Perspective

Cisco EnergyWise Optimize and Cost Saving. Traditional IT Power Management

Cisco Gains Real-time Visibility in the Business with SAP HANA

Knowledge Discovery & Data Mining

PRIVACY AND ONLINE DATA: CAN WE HAVE BOTH?

ISM 50 - Business Information Systems

Transcription:

Challenges and Opportunities with Big Data By: Rohit Ranjan

Introduction What is Big Data? Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them (Wiki Definition) Usually data >1Terabytes is considered Big-Data In 2010, 13 Exabyte (13*1024*1024 Terabytes) of new data was generated by users and enterprises Potential value of only global personal location is more than $700 Billion

Why is Big Data Important? It can help us with: Cost reductions for industry and users Time reductions, New product development and optimized offerings, and Smart decision making New Research in Medical, Life Sciences, Astronomy Fraud Detection, Security

Why do we have Big-Data Problem? Too Many bytes (Volume) Too high a rate (Velocity) Too many sources (Variety) Non Scalable Source Human-Intensive Compute Intensive

Big Data Analysis Pipeline

Phases in Processing Pipeline Data Acquisition and Recording Information Extraction and Cleaning Data Integration, Aggregation, and Representation Query Processing, Data Modeling, and Analysis Interpretation

Data Acquisition and Recording Mainly Two challenges in Data Acquisition: Define filters in such a a way that discard non-useful data and do not discard any useful data Automatically generate the right metadata to describe what data is recorded, how it is recorded and measured

Information Extraction and Cleaning Information collected will not be in a format ready for analysis We require an information extraction process that pulls out required information and express it in structured format

Data Integration, Aggregation, and Representation Storing, Identifying, and forming right meta-data should happen in an automated manner (i.e intelligent database design) We must enable professionals to create effective effective database design, so that even in absence of intelligent database design, they can create effective databases

Query Processing, Data Modeling, and Analysis Querying and Mining Big Data are different from traditional statistical analysis It s noisy, dynamic, heterogeneous and untrustworthy Problem with Big Data analysis is lack of coordination between database systems Tight coupling between database system will benefit expressiveness and analysis

Interpretation Ability to analyze Big-Data is of limited value, if users can t understand Rich palette of visualization is important in conveying users, the results of queries in a way, that s best understood in particular domain

Challenges in Big Data Analysis Heterogeneity and Incompleteness Scale Timeliness Privacy Human Collaboration

Heterogeneity and Incompleteness Machine expect homogenous data, and cannot understand nuance Even after data cleaning, and error correction some incompleteness and error in data remains

Scale Data Volume is scaling faster than compute resources Computing are shifting to parallel processor, so now we have to deal with parallelism within single node Cloud Computing: It aggregates multiple disparate workloads, with varying performance goals Storage devices: Shift from HDD to Solid State drives, it requires rethinking of how we design storage subsystem

Timeliness Design a system that can process given size of data faster Example: In credit card fraud we should be able to detect fraud in timely manner.

Privacy Privacy is huge concern in today s big-data system Design a system that can give user fine-grained control over sharing

Human Collaboration Many patterns are easily detected by humans but computer algorithms have hard time finding it We can incorporate human and machine together to get a better visualization of data

System Architecture New kind of System Architecture is needed to process Big-Data Big-Data Typically involves multiple phases (from gathering data, extraction, data cleaning, statistical modelling etc.) Current system offer little to no support for Big-Data Pipeline

Thank You!!

Question and Answer?