Sensor Data Collection and Processing

Size: px
Start display at page:

Download "Sensor Data Collection and Processing"

Transcription

1 Sensor Data Collection and Processing Applying Web Scale To Sensor Data

2 Today s speaker Josh Patterson josh@cloudera.com / Master s Thesis: self-organizing mesh networks Published in IAAI-09: TinyTermite: A Secure Routing Algorithm Conceived, built, and led Hadoop integration for the openpdc project at TVA (Smartgrid stuff) Led small team which designed classification techniques for time series and Map Reduce Open source work at Now: Solutions Architect at Cloudera 2

3 NERC Sensor Data Collection openpdc PMU Data Collection circa Sensors 30 samples/second 4.3B Samples/day Housed in Hadoop

4 Major Themes From openpdc How much is coming in? Too much to make SAN storage cost effective! Planned for ½ Petabyte of Data storage Ok. So, then, where can this data live? Not at Amazon! Regulations, etc. Also: For fun, price ½ Petabyte of storage at amazon Enter Hadoop Linear Scaling Storage in both space and cost Also had that handy MapReduce thing included

5 Apache Hadoop Open Source Distributed Storage and Processing Engine Consolidates Mixed Data Move complex and relational data into a single repository MapReduce Hadoop Distributed File System (HDFS) Stores Inexpensively Keep raw data always available Use industry standard hardware Processes at the Source Eliminate ETL bottlenecks Mine data first, govern later

6 What Hadoop does Networks industry standard hardware nodes together to combine compute and storage into scalable distributed system Scales to petabytes without modification Manages fault tolerance and data replication automatically Processes semi-structured and unstructured data easily Supports MapReduce natively to analyze data in parallel

7 It s About More Than Just Collection Scenario 1 million sensors, collecting sample / 5 min 5 year retention policy Storage needs of 15 TB Reliability and Availability? Processing Single Machine: 15TB takes 2.2 DAYS to scan We d like to do a lot more than simple scans! 20 nodes: Same task takes 11 Minutes Also can use Parallel Programming Model MapReduce

8 Unstructured Data Explosion (You) Complex, Unstructured Relational 2,500 exabytes of new information in 2012 with Internet as primary driver Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 zettabytes this year Source: IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May

9 The Cloud The Legend Everything just works in the Cloud The Myth Cloud Computing is a New Technology The Reality Cloud Computing are just more advanced network based applications Not all cloud services are equal, caveat emptor

10 Scientific American on Cloud Computing Much of what makes cloud computing tick (internet, mobile computers, networked data storage, ) Has been available since the beginning of the dot-com era more than a decade ago. What is new, or at least more recent, is: The greater variety of content that can be delivered online to a wider variety of gadgets.

11 As it Turn Out The Cloud is just some place in North Virginia Business Insider Lessons Learned From AMZN Failure Amazon is not infallible, and the cloud is not magic. Amazon is not the only IaaS provider, and your application should be able to run on more than one. Cloud deployments must be automated and should take cloud server reliability characteristics into account Read more: the-right-lessons-from-the-amazon-outage #ixzz1l4gczcsu

12 Things to Think About Can I really afford to be locked into a proprietary cloud technology long term? Open Source is coming of age in the enterprise The market for data analysis is exploding Can I use my technology to process this data at scale - -- and process said data fast? Reliable Storage as a serious cost consideration What s a Terabyte cost on this platform? What s a Petabyte cost on this platform?

13 Hadoop Adoption

14 Take Aways Not All Data Can Go Into The Cloud Smartgrid data is sensitive, needs private cloud Caveat Emptor You can t just move everything to the cloud Not all cloud tech is of the same reliability Consider Speed at Scale as the killer app Cost at Scale, Cost of Lock-in

15 Questions? Cloudera s Distribution including Apache Hadoop (CDH): Resources Timeseries blog article

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

I am a Data Nerd and so are YOU!

I am a Data Nerd and so are YOU! I am a Data Nerd and so are YOU! Not This Type of Nerd Data Nerd Coffee Talk We saw Cloudera as the lone open source champion of Hadoop and the EMC/Greenplum/MapR initiative as a more closed and

More information

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems 1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for

More information

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera, How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS

More information

5 reasons why choosing Apache Cassandra is planning for a multi-cloud future

5 reasons why choosing Apache Cassandra is planning for a multi-cloud future White Paper 5 reasons why choosing Apache Cassandra is planning for a multi-cloud future Abstract We have been hearing for several years now that multi-cloud deployment is something that is highly desirable,

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

Spatial Analytics Built for Big Data Platforms

Spatial Analytics Built for Big Data Platforms Spatial Analytics Built for Big Platforms Roberto Infante Software Development Manager, Spatial and Graph 1 Copyright 2011, Oracle and/or its affiliates. All rights Global Digital Growth The Internet of

More information

High Performance and Cloud Computing (HPCC) for Bioinformatics

High Performance and Cloud Computing (HPCC) for Bioinformatics High Performance and Cloud Computing (HPCC) for Bioinformatics King Jordan Georgia Tech January 13, 2016 Adopted From BIOS-ICGEB HPCC for Bioinformatics 1 Outline High performance computing (HPC) Cloud

More information

CASE STUDY: USING THE HYBRID CLOUD TO INCREASE CORPORATE VALUE AND ADAPT TO COMPETITIVE WORLD TRENDS

CASE STUDY: USING THE HYBRID CLOUD TO INCREASE CORPORATE VALUE AND ADAPT TO COMPETITIVE WORLD TRENDS CASE STUDY: USING THE HYBRID CLOUD TO INCREASE CORPORATE VALUE AND ADAPT TO COMPETITIVE WORLD TRENDS Geoff Duncan, Senior Solutions Architect, Digital Fortress Brandon Tanner, Senior Manager, Rentsys Recovery

More information

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype? Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/

More information

2013 AWS Worldwide Public Sector Summit Washington, D.C.

2013 AWS Worldwide Public Sector Summit Washington, D.C. 2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic

More information

Data Analysis Using MapReduce in Hadoop Environment

Data Analysis Using MapReduce in Hadoop Environment Data Analysis Using MapReduce in Hadoop Environment Muhammad Khairul Rijal Muhammad*, Saiful Adli Ismail, Mohd Nazri Kama, Othman Mohd Yusop, Azri Azmi Advanced Informatics School (UTM AIS), Universiti

More information

Cloud and Big Data: Business Continuity for Outside of the Enterprise

Cloud and Big Data: Business Continuity for Outside of the Enterprise Cloud and Big Data: Business Continuity for Outside of the Enterprise Daniel Mikulsky, MBCP Business Continuity + Disaster Recovery - Offering Manager dmikulsk@csc.com CSC Global Cybersecurity - Delivering

More information

BIG DATA TESTING: A UNIFIED VIEW

BIG DATA TESTING: A UNIFIED VIEW http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation

More information

Large Scale Processing with Hadoop

Large Scale Processing with Hadoop Large Scale Processing with Hadoop William Palmer Some slides courtesy of Per Møldrup-Dalum (State and University Library, Denmark) and Sven Schlarb (Austrian National Library) SCAPE Information Day British

More information

SwiftStack and python-swiftclient

SwiftStack and python-swiftclient SwiftStack and python-swiftclient Summary Storage administrators around the world are turning to object storage and particularly SwiftStack s enterprise-ready object storage built on OpenStack Swift for

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

Investing in a Better Storage Environment:

Investing in a Better Storage Environment: Investing in a Better Storage Environment: Best Practices for the Public Sector Investing in a Better Storage Environment 2 EXECUTIVE SUMMARY The public sector faces numerous and known challenges that

More information

Big Data and Object Storage

Big Data and Object Storage Big Data and Object Storage or where to store the cold and small data? Sven Bauernfeind Computacenter AG & Co. ohg, Consultancy Germany 28.02.2018 Munich Volume, Variety & Velocity + Analytics Velocity

More information

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack Robert Collazo Systems Engineer Rackspace Hosting The Rackspace Vision Agenda Truly a New Era of Computing 70 s 80 s Mainframe Era 90

More information

The Intersection of Cloud & Solid State Storage

The Intersection of Cloud & Solid State Storage The Intersection of Cloud & Solid State Storage Val Bercovici Cloud Czar, NetApp Office of the CTO SNIA Cloud Storage Initiative SNIA Solid State Storage Initiative Cloud Backdrop Worldwide IT spending

More information

Big Data and Cloud Computing

Big Data and Cloud Computing Big Data and Cloud Computing Presented at Faculty of Computer Science University of Murcia Presenter: Muhammad Fahim, PhD Department of Computer Eng. Istanbul S. Zaim University, Istanbul, Turkey About

More information

Hybrid Infrastructure Hosting Clouds + Dedicated + Colocated GoGrid / ServePath September 09

Hybrid Infrastructure Hosting Clouds + Dedicated + Colocated GoGrid / ServePath September 09 Hybrid Infrastructure Hosting Clouds + Dedicated + Colocated 2009 GoGrid / ServePath - 1 - September 09 Contents What is Cloud Computing? Benefits Cloud & Hybrid Hosting What can Clouds do for ME? When

More information

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10 Scalable Web Programming CS193S - Jan Jannink - 2/25/10 Weekly Syllabus 1.Scalability: (Jan.) 2.Agile Practices 3.Ecology/Mashups 4.Browser/Client 7.Analytics 8.Cloud/Map-Reduce 9.Published APIs: (Mar.)*

More information

Big Data The end of Data Warehousing?

Big Data The end of Data Warehousing? Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort

More information

Convergence and Collaboration: Transforming Business Process and Workflows

Convergence and Collaboration: Transforming Business Process and Workflows Convergence and Collaboration: Transforming Business Process and Workflows Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Convergence & Collaboration:

More information

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja

More information

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage Silverton Consulting, Inc. StorInt Briefing 2017 SILVERTON CONSULTING, INC. ALL RIGHTS RESERVED Page 2 Introduction Unstructured data has

More information

EMC Strategy Overview: Journey To The Private Cloud

EMC Strategy Overview: Journey To The Private Cloud EMC Strategy Overview: Journey To The Private Cloud Chuck Hollis VP Global Marketing CTO The Private Cloud... What is it? Why now? The Private Cloud: Why now? IT infrastructure Complex Inefficient Inflexible

More information

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management TCO REPORT NAS File Tiering Economic advantages of enterprise file management Executive Summary Every organization is under pressure to meet the exponential growth in demand for file storage capacity.

More information

A REVIEW PAPER ON BIG DATA ANALYTICS

A REVIEW PAPER ON BIG DATA ANALYTICS A REVIEW PAPER ON BIG DATA ANALYTICS Kirti Bhatia 1, Lalit 2 1 HOD, Department of Computer Science, SKITM Bahadurgarh Haryana, India bhatia.kirti.it@gmail.com 2 M Tech 4th sem SKITM Bahadurgarh, Haryana,

More information

From Internet Data Centers to Data Centers in the Cloud

From Internet Data Centers to Data Centers in the Cloud From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs

More information

SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility

SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility Control Any Data. Any Cloud. Anywhere. SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility Understanding SoftNAS Cloud SoftNAS, Inc. is the #1 software-defined

More information

The age of Big Data Big Data for Oracle Database Professionals

The age of Big Data Big Data for Oracle Database Professionals The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG

More information

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and

More information

EMC Forum EMC ViPR and ECS: A Lap Around Software-Defined Services

EMC Forum EMC ViPR and ECS: A Lap Around Software-Defined Services EMC Forum 2014 Copyright 2014 EMC Corporation. All rights reserved. 1 EMC ViPR and ECS: A Lap Around Software-Defined Services 2 Session Agenda Market Dynamics EMC ViPR Overview What s New in ViPR Controller

More information

The intelligence of hyper-converged infrastructure. Your Right Mix Solution

The intelligence of hyper-converged infrastructure. Your Right Mix Solution The intelligence of hyper-converged infrastructure Your Right Mix Solution Applications fuel the idea economy SLA s, SLA s, SLA s Regulations Latency Performance Integration Disaster tolerance Reliability

More information

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate

More information

Cloud Computing: Making the Right Choice for Your Organization

Cloud Computing: Making the Right Choice for Your Organization Cloud Computing: Making the Right Choice for Your Organization A decade ago, cloud computing was on the leading edge. Now, 95 percent of businesses use cloud technology, and Gartner says that by 2020,

More information

Chapter 6 VIDEO CASES

Chapter 6 VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Chapter 5. The MapReduce Programming Model and Implementation

Chapter 5. The MapReduce Programming Model and Implementation Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing

More information

Land Administration and Management: Big Data, Fast Data, Semantics, Graph Databases, Security, Collaboration, Open Source, Shareable Information

Land Administration and Management: Big Data, Fast Data, Semantics, Graph Databases, Security, Collaboration, Open Source, Shareable Information Land Administration and Management: Big Data, Fast Data, Semantics, Graph Databases, Security, Collaboration, Open Source, Shareable Information Platform Steven Hagan, Vice President, Engineering 1 Copyright

More information

Big Data It s not just for Google Any More

Big Data It s not just for Google Any More Big Data It s not just for Google Any More The Software and Compelling Economics of Big Data Computing EXECUTIVE SUMMARY Big Data holds out the promise of providing businesses with differentiated competitive

More information

Microsoft Analytics Platform System (APS)

Microsoft Analytics Platform System (APS) Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual

More information

Strategic Briefing Paper Big Data

Strategic Briefing Paper Big Data Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which

More information

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most

More information

Microsoft Big Data and Hadoop

Microsoft Big Data and Hadoop Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common

More information

Typical size of data you deal with on a daily basis

Typical size of data you deal with on a daily basis Typical size of data you deal with on a daily basis Processes More than 161 Petabytes of raw data a day https://aci.info/2014/07/12/the-dataexplosion-in-2014-minute-by-minuteinfographic/ On average, 1MB-2MB

More information

Online Bill Processing System for Public Sectors in Big Data

Online Bill Processing System for Public Sectors in Big Data IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 10 March 2018 ISSN (online): 2349-6010 Online Bill Processing System for Public Sectors in Big Data H. Anwer

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

CLOUD COMPUTING. A public cloud sells services to anyone on the Internet. The cloud infrastructure is made available to

CLOUD COMPUTING. A public cloud sells services to anyone on the Internet. The cloud infrastructure is made available to CLOUD COMPUTING In the simplest terms, cloud computing means storing and accessing data and programs over the Internet instead of your computer's hard drive. The cloud is just a metaphor for the Internet.

More information

White. Paper. EMC Isilon Scale-out Without Compromise. July, 2012

White. Paper. EMC Isilon Scale-out Without Compromise. July, 2012 White Paper EMC Isilon Scale-out Without Compromise By Terri McClure, Senior Analyst July, 2012 This ESG White Paper was commissioned by EMC and is distributed under license from ESG. 2012, The Enterprise

More information

Cloud Computing Techniques for Big Data and Hadoop Implementation

Cloud Computing Techniques for Big Data and Hadoop Implementation Cloud Computing Techniques for Big Data and Hadoop Implementation Nikhil Gupta (Author) Ms. komal Saxena(Guide) Research scholar Assistant Professor AIIT, Amity university AIIT, Amity university NOIDA-UP

More information

Database Management Systems

Database Management Systems Database Management Systems Fall 2017 Knowledge is of two kinds: we know a subject ourselves, or we know where we can find information upon it. -- Samuel Johnson (1709-1784) Queries for Today Why? Who?

More information

Big Data Issues for Federal Records Managers

Big Data Issues for Federal Records Managers Big Data Issues for Federal Records Managers ARMA Metro Conference April 26, 2017 Lisa Haralampus Director, Federal Records Management Policy and Outreach Section Office of the Chief Records Officer for

More information

EMC Forum 2014 EMC ViPR and ECS: A Lap Around Software-Defined Services. Magnus Nilsson Blog: purevirtual.

EMC Forum 2014 EMC ViPR and ECS: A Lap Around Software-Defined Services. Magnus Nilsson Blog: purevirtual. EMC Forum 2014 EMC ViPR and ECS: A Lap Around Software-Defined Services Magnus Nilsson magnus.nilsson@emc.com Twitter: @swevm Blog: purevirtual.eu 1 Session Agenda Market Dynamics EMC ViPR Overview What

More information

The Internet. Charging for Internet 2/8/12. Conceptual Picture of the Internet. What does 1000M and 200M mean? Dr. Hayden Kwok-Hay So

The Internet. Charging for Internet 2/8/12. Conceptual Picture of the Internet. What does 1000M and 200M mean? Dr. Hayden Kwok-Hay So 2/8/12 The Internet CCST9015 Feb 8, 2012 What does 1000M and 200M mean? Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering 2 Charging for Internet One is charging for speed (How

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Top Trends in DBMS & DW

Top Trends in DBMS & DW Oracle Top Trends in DBMS & DW Noel Yuhanna Principal Analyst Forrester Research Trend #1: Proliferation of data Data doubles every 18-24 months for critical Apps, for some its every 6 months Terabyte

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer & Provisioning with SUSE Enterprise Storage Nyers Gábor Trainer & Consultant @Trebut gnyers@trebut.com Managing storage growth and costs of the software-defined datacenter PRESENT Easily scale and manage

More information

CSE6331: Cloud Computing

CSE6331: Cloud Computing CSE6331: Cloud Computing Leonidas Fegaras University of Texas at Arlington c 2019 by Leonidas Fegaras Cloud Computing Fundamentals Based on: J. Freire s class notes on Big Data http://vgc.poly.edu/~juliana/courses/bigdata2016/

More information

MAKING MONEY ON OPENSTACK. Boris

MAKING MONEY ON OPENSTACK. Boris MAKING MONEY ON OPENSTACK Boris Renski b@renski.com @zer0tweets China and Russia are alike! 2 We love gymnastics! 3 We love chess! 4 and we love money! 5 6 When most people think of OpenSt ack, they imagine

More information

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice 2014 年 3 月 13 日星期四 From Big Data to Big Value Infrastructure Needs and Huawei Best Practice Data-driven insight Making better, more informed decisions, faster Raw Data Capture Store Process Insight 1 Data

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 2nd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2017) Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 1 Information Engineering

More information

Architekturen für die Cloud

Architekturen für die Cloud Architekturen für die Cloud Eberhard Wolff Architecture & Technology Manager adesso AG 08.06.11 What is Cloud? National Institute for Standards and Technology (NIST) Definition On-demand self-service >

More information

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,

More information

Deep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic

Deep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic Deep Storage for Exponential Data Nathan Thompson CEO, Spectra Logic HISTORY Partnered with Fujifilm on a variety of projects HQ in Boulder, 35 years of business Customers in 54 countries Spectra builds

More information

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI 2017 International Conference on Electronic, Control, Automation and Mechanical Engineering (ECAME 2017) ISBN: 978-1-60595-523-0 The Establishment of Large Data Mining Platform Based on Cloud Computing

More information

Evolving To The Big Data Warehouse

Evolving To The Big Data Warehouse Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from

More information

Introduction to the Mathematics of Big Data. Philippe B. Laval

Introduction to the Mathematics of Big Data. Philippe B. Laval Introduction to the Mathematics of Big Data Philippe B. Laval Fall 2017 Introduction In recent years, Big Data has become more than just a buzz word. Every major field of science, engineering, business,

More information

Scalable Tools - Part I Introduction to Scalable Tools

Scalable Tools - Part I Introduction to Scalable Tools Scalable Tools - Part I Introduction to Scalable Tools Adisak Sukul, Ph.D., Lecturer, Department of Computer Science, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/mbds2018/ Scalable Tools session

More information

Get Smart about Backup & Recovery

Get Smart about Backup & Recovery Get Smart about Backup & Recovery Some of today s biggest IT challenges are being driven by a single issue: data. Lots of data. In fact, protecting and storing these burgeoning data volumes with shrinking

More information

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

Renovating your storage infrastructure for Cloud era

Renovating your storage infrastructure for Cloud era Renovating your storage infrastructure for Cloud era Nguyen Phuc Cuong Software Defined Storage Country Sales Leader Copyright IBM Corporation 2016 2 Business SLAs Challenging Traditional Storage Approaches

More information

Smart Data Catalog DATASHEET

Smart Data Catalog DATASHEET DATASHEET Smart Data Catalog There is so much data distributed across organizations that data and business professionals don t know what data is available or valuable. When it s time to create a new report

More information

Cat Herding. Why It s Time for a Millennial Approach to Storage. Cloud Expo East Western Digital Corporation All rights reserved 01/25/2016

Cat Herding. Why It s Time for a Millennial Approach to Storage. Cloud Expo East Western Digital Corporation All rights reserved 01/25/2016 Cat Herding Why It s Time for a Millennial Approach to Storage Cloud Expo East 1 A Time and Place for Everything The PC Movement of the 1980 s put pressure on mainframe storage architects In 1987 the RAID

More information

The 7 Habits of Highly Effective API and Service Management

The 7 Habits of Highly Effective API and Service Management 7 Habits of Highly Effective API and Service Management: Introduction The 7 Habits of Highly Effective API and Service Management... A New Enterprise challenge has emerged. With the number of APIs growing

More information

Microsoft Exam

Microsoft Exam Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred

More information

Public, Private, or Hybrid Cloud

Public, Private, or Hybrid Cloud White Paper Public, Private, or Hybrid Cloud www.rapidscale.net 1 Public, Private, or Hybrid Cloud When it comes to business, cloud computing is on everyone s mind. This next generation of computing technology

More information

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been

More information

Backtesting with Spark

Backtesting with Spark Backtesting with Spark Patrick Angeles, Cloudera Sandy Ryza, Cloudera Rick Carlin, Intel Sheetal Parade, Intel 1 Traditional Grid Shared storage Storage and compute scale independently Bottleneck on I/O

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV

More information

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since

More information

BUSTED! 5 COMMON MYTHS OF MODERN INFRASTRUCTURE. These Common Misconceptions Could Be Holding You Back

BUSTED! 5 COMMON MYTHS OF MODERN INFRASTRUCTURE. These Common Misconceptions Could Be Holding You Back BUSTED! 5 COMMON MYTHS OF MODERN INFRASTRUCTURE These Common Misconceptions Could Be Holding You Back 2 IT Is Facing a New Set of Challenges As technology continues to evolve, IT must adjust to changing

More information

Design of Hadoop-based Framework for Analytics of Large Synchrophasor Datasets

Design of Hadoop-based Framework for Analytics of Large Synchrophasor Datasets Available online at www.sciencedirect.com Procedia Computer Science 12 (2012 ) 254 258 Complex Adaptive Systems, Publication 2 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University

More information

The Hadoop Paradigm & the Need for Dataset Management

The Hadoop Paradigm & the Need for Dataset Management The Hadoop Paradigm & the Need for Dataset Management 1. Hadoop Adoption Hadoop is being adopted rapidly by many different types of enterprises and government entities and it is an extraordinarily complex

More information

GDPR Data Discovery and Reporting

GDPR Data Discovery and Reporting GDPR Data Discovery and Reporting PRODUCT OVERVIEW The GDPR Challenge The EU General Data Protection Regulation (GDPR) is a regulation mainly concerned with how data is captured and retained, and how organizations

More information

WHITE PAPER: TOP 10 CAPABILITIES TO LOOK FOR IN A DATA CATALOG

WHITE PAPER: TOP 10 CAPABILITIES TO LOOK FOR IN A DATA CATALOG WHITE PAPER: TOP 10 CAPABILITIES TO LOOK FOR IN A DATA CATALOG The #1 Challenge in Successfully Deploying a Data Catalog The data cataloging space is relatively new. As a result, many organizations don

More information

Approaching the Petabyte Analytic Database: What I learned

Approaching the Petabyte Analytic Database: What I learned Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may

More information

Hadoop, Yarn and Beyond

Hadoop, Yarn and Beyond Hadoop, Yarn and Beyond 1 B. R A M A M U R T H Y Overview We learned about Hadoop1.x or the core. Just like Java evolved, Java core, Java 1.X, Java 2.. So on, software and systems evolve, naturally.. Lets

More information

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT. Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem

More information

Everything you need to know about cloud. For companies with people in them

Everything you need to know about cloud. For companies with people in them Everything you need to know about cloud For companies with people in them You used to know where you stood with the word cloud. It meant those fluffy things floating above you, bringing shade and rain,

More information

Large-Scale Data Engineering. Overview and Introduction

Large-Scale Data Engineering. Overview and Introduction Large-Scale Data Engineering Overview and Introduction Administration Blackboard Page Announcements, also via email (pardon html formatting) Practical enrollment, Turning in assignments, Check Grades Contact:

More information

Integrating Advanced Analytics with Big Data

Integrating Advanced Analytics with Big Data Integrating Advanced Analytics with Big Data Ian McKenna, Ph.D. Senior Financial Engineer 2017 The MathWorks, Inc. 1 The Goal SCALE! 2 The Solution tall 3 Agenda Introduction to tall data Case Study: Predicting

More information