I am a Data Nerd and so are YOU!
|
|
- Veronica Dawson
- 5 years ago
- Views:
Transcription
1 I am a Data Nerd and so are YOU!
2
3
4
5
6
7
8 Not This Type of Nerd
9
10
11 Data Nerd Coffee Talk We saw Cloudera as the lone open source champion of Hadoop and the EMC/Greenplum/MapR initiative as a more closed and proprietary long shot We are the only real-time data store that combines fine-grained security controls, scalability to the 10s of petabytes, flexible schemas for unstructured and semi-structured information, and diverse analytical capabilities I am working on a framework to allow construction of large-scale analytical queries on unstructured data The IoT and big data are clearly growing apace, and are set to transform many areas everyday life. But which particular sectors are likely to feel the IoT/big data disruption first?
12 What is Big Data?
13 What is Big Data? Big data is like teenage sex: Everyone talks about it, Nobody really knows how to do it, Everyone thinks everyone else is doing it, So everyone claims they are doing it -Dan Ariely
14 40 ZETTABYTES 2.5 Quintillion Bytes of data will be created by 2020, an increase of 300 times from BILLION PEOPLE have cell phones of data created each day Volume SCALE OF DATA Most companies in the US have at least 100 TERABYTES of data stored The Four V s of BIG DATA: Global healthcare data size estimated to be 150 EXABYTES Variety DIFFERENT FORMS OF DATA 400 MILLION TWEETS are sent per day by about 200 million monthly users By 2015, it s anticipated there will be 420 MILLION WEARABLE, WIRELESS HEALTH MONITORS 4 BILLION+ HOURS OF VIDEO are watched on YouTube each month NYSE captures 1 TB OF TRADE INFORMATION during each trading session Velocity STREAMING DATA Source: McKinsey Global Institute Modern cares have close to 100 SENSORS that monitor fuel levels, tire pressure, etc Problems that are unsolvable using traditional tools 1 IN 3 BUSINESS LEADERS don t trust the information they user to make decisions Veracity UNCERTAINTY OF DATA Poor data quality costs the US economy around $3.1 TRILLION A YEAR 27% OF RESPONDENTS were unsure of how much of their data was inaccurate
15 Data Example: Structured Relational
16 Data Example: Structured XML/HL7
17 Data Example: Unstructured Log File
18 Data Example: Unstructured Tweet
19 Data Example: Unstructured Sensor
20 Data Example: Unstructured Image
21 Summary of Common Clinical Data ICD CPT Lab Medication Clinical Notes Availability High High High Medium Medium Recall Medium Poor Medium Precision Medium High High Format Structured Structured Pros Cons Easy to work with, a good approximation of disease Disease code often used for screening, there fore disease might not be there Easy to work with, high precision Missing data Mostly structured High data validity Data normalization and ranges Inpatient: High Outpatient: Variable Inpatient: High Outpatient: Variable Structured and unstructured High data validity Prescribed not necessarily taken Medium Medium high Unstructured More details about doctors thoughts Difficult to process Source: Joshua Denny: Mining Electronic Health Records
22 Hadoop Hadoop Distributed File System (HDFS) Commodity Hardware Files stored as blocks Reliability achieved through replication (clusters) Master Node [few] JobTracker: Distributes Map/Reduce Tasks TaskTracker: Receives Map/Reduce Tasks NameNode: Coordinate meta-data DataNode: Stores & replicates data across cluster Worker Node: Processing Power [many] No data caching Open-Source Framework Allows for Multiple Programming Language Parallel processing Task-based programming logic Components will fail at high rate (and that s ok) Map / Reduce (aka Divide and Conquer) Data will be contained in a relatively small number of big files Data files are write-once Lots of streaming reas Higher sustained throughput across large amounts of data Distributions & Programming Tools
23 Hadoop in the Enterprise SCIENCE Medical imaging, sensor data, genome sequencing, EHR XML Environment Enterprise INDUSTRY Financial, Pharma, manufacturing, insurance, fraud, retail LEGACY Claims, sales, customer behavior, product databases, accounting CSV EDI LOG SQL TXT JSON BIN JPEG Create Map Reduce Commodity Server Cloud Import Dashboards Business Intelligence Applications SYSTEM DATA Log files, health feeds, activity streams, network messages, web analytics OBJ KLM Hadoop Distributed File System (HDFS) RDBMS 1 High Volume 2 MapReduce Process 3 Data Flows Consumer Results Source:
24
25 Data Science Big Data needs Data Science but Data Science doesn t need Big Data Carla Gentry
26 Dr. Ashenfelter * winter rainfall * average growing season temperature = * harvest rainfall
27 Analytic Complexity Analytic Approaches Small amount of data or samples (MB/GB) Large amount of data (TB/PB) Advanced Analytics Smaller data sets using advanced techniques to analyze impact of future scenarios Big-Data Analytics Can fuse different data types on a massive scale resulting in predictive and real-time analysis capabilities Predictive & Real-time analytic capabilities Basic Analytics Relies on historical observations to help avoid past mistakes and duplicate past successes Big-Data Computing Data become more consolidated while analytic work flows are more streamlined and automated Accurate historical observations Size of data Source: Booz Allen Hamilton
28 Matthew 5:5 (Ultra-Revised Standard Version) And the Data Nerds Shall Inherit the Earth
29 Lucas M Tramontozzi SCI Solutions ltramontozz@scisolutions.com
30 I am a Data Nerd and so are YOU!
What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?
Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation
More information<Insert Picture Here> Introduction to Big Data Technology
Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationBig data. Professor Dan Ariely, Duke University.
Big data BIG DATA is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... Professor Dan Ariely,
More informationInternational Journal of Advance Engineering and Research Development. A Study: Hadoop Framework
Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja
More informationHadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391
Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391 Outline Big Data Big Data Examples Challenges with traditional storage NoSQL Hadoop HDFS MapReduce Architecture 2 Big Data In information
More informationBIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,
BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 1 OBJECTIVES ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 2 WHAT
More informationECEN Security and Privacy for Big Data. Introduction Professor Yanmin Gong 08/22/2017
ECEN 5060 - Security and Privacy for Big Data Introduction Professor Yanmin Gong 08/22/2017 Administrivia Course Hour: T/R 3:30-4:45 pm @ CLB 101 Office Hour: T/R 2:30-3:30 pm Any question besides assignment
More informationBig Data - Some Words BIG DATA 8/31/2017. Introduction
BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1 What is Big Data Big Data means
More informationMassively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data
Big Data Really Fast A Proven In-Memory Analytical Processing Platform for Big Data 2 Executive Summary / Overview: Big Data can be a big headache for organizations that have outgrown the practicality
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationModern Database Concepts
Modern Database Concepts Introduction to the world of Big Data Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz What is Big Data? buzzword? bubble? gold rush? revolution? Big data is like teenage
More informationBig Data Issues for Federal Records Managers
Big Data Issues for Federal Records Managers ARMA Metro Conference April 26, 2017 Lisa Haralampus Director, Federal Records Management Policy and Outreach Section Office of the Chief Records Officer for
More informationBig Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela
More informationWhen, Where & Why to Use NoSQL?
When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),
More informationSensor Data Collection and Processing
Sensor Data Collection and Processing Applying Web Scale To Sensor Data Today s speaker Josh Patterson josh@cloudera.com / twitter: @jpatanooga Master s Thesis: self-organizing mesh networks Published
More information2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice
2014 年 3 月 13 日星期四 From Big Data to Big Value Infrastructure Needs and Huawei Best Practice Data-driven insight Making better, more informed decisions, faster Raw Data Capture Store Process Insight 1 Data
More informationBull Fast Track/PDW and Big Data
Bull Fast Track/PDW and Big Data Add High Performance BI to your Big Data Roger Van Unen Expert Microsoft / BI roger.van-unen@bull.net http://www.bull.fr/bi/fastrack.html Michael Schmitter BI Sales Germany
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationMassive Online Analysis - Storm,Spark
Massive Online Analysis - Storm,Spark presentation by R. Kishore Kumar Research Scholar Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Kharagpur-721302, India (R
More informationFast Innovation requires Fast IT
Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationOracle NoSQL Database Overview Marie-Anne Neimat, VP Development
Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development June14, 2012 1 Copyright 2012, Oracle and/or its affiliates. All rights Agenda Big Data Overview Oracle NoSQL Database Architecture Technical
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationBig Data The end of Data Warehousing?
Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort
More informationSpatial Analytics Built for Big Data Platforms
Spatial Analytics Built for Big Platforms Roberto Infante Software Development Manager, Spatial and Graph 1 Copyright 2011, Oracle and/or its affiliates. All rights Global Digital Growth The Internet of
More informationBuilding an Integrated Big Data & Analytics Infrastructure September 25, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle
Building an Integrated Big Data & Analytics Infrastructure September 25, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to
More informationBIG DATA TESTING: A UNIFIED VIEW
http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation
More informationFrom Internet Data Centers to Data Centers in the Cloud
From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs
More informationA Survey on Big Data
A Survey on Big Data D.Prudhvi 1, D.Jaswitha 2, B. Mounika 3, Monika Bagal 4 1 2 3 4 B.Tech Final Year, CSE, Dadi Institute of Engineering & Technology,Andhra Pradesh,INDIA ---------------------------------------------------------------------***---------------------------------------------------------------------
More informationDeep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic
Deep Storage for Exponential Data Nathan Thompson CEO, Spectra Logic HISTORY Partnered with Fujifilm on a variety of projects HQ in Boulder, 35 years of business Customers in 54 countries Spectra builds
More informationGlobal Journal of Engineering Science and Research Management
A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV
More informationAcquiring Big Data to Realize Business Value
Acquiring Big Data to Realize Business Value Agenda What is Big Data? Common Big Data technologies Use Case Examples Oracle Products in the Big Data space In Summary: Big Data Takeaways
More informationStoring in the Cloud: What You Need to Know Bret Piatt Rackspace Hosting
Storing in the Cloud: What You Need to Know Bret Piatt Rackspace Hosting SAFE HARBOR STATEMENT This presentation contains "forward-looking" statements that involve risks, uncertainties and assumptions.
More informationData Management Glossary
Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative
More informationApproaching the Petabyte Analytic Database: What I learned
Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may
More informationHADOOP FRAMEWORK FOR BIG DATA
HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further
More informationDistributed Systems CS6421
Distributed Systems CS6421 Intro to Distributed Systems and the Cloud Prof. Tim Wood v I teach: Software Engineering, Operating Systems, Sr. Design I like: distributed systems, networks, building cool
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationMI-PDB, MIE-PDB: Advanced Database Systems
MI-PDB, MIE-PDB: Advanced Database Systems http://www.ksi.mff.cuni.cz/~svoboda/courses/2015-2-mie-pdb/ Lecture 10: MapReduce, Hadoop 26. 4. 2016 Lecturer: Martin Svoboda svoboda@ksi.mff.cuni.cz Author:
More informationProcessing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.
Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most
More informationA Survey on Comparative Analysis of Big Data Tools
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,
More informationIntroduction to the Mathematics of Big Data. Philippe B. Laval
Introduction to the Mathematics of Big Data Philippe B. Laval Fall 2017 Introduction In recent years, Big Data has become more than just a buzz word. Every major field of science, engineering, business,
More informationBigData And the Zoo. Mansour Raad Federal GIS Conference 2014
Federal GIS Conference 2014 February 10 11, 2014 Washington DC BigData And the Zoo Mansour Raad http://thunderheadxpler.blogspot.com/ mraad@esri.com @mraad What is BigData? Big data is like teenage sex:
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationA Review Approach for Big Data and Hadoop Technology
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 A Review Approach for Big Data and Hadoop Technology Prof. Ghanshyam Dhomse
More informationIntegrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers
Oracle zsig Conference IBM LinuxONE and z System Servers Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers Sam Amsavelu Oracle on z Architect IBM Washington
More informationBased on Big Data: Hype or Hallelujah? by Elena Baralis
Based on Big Data: Hype or Hallelujah? by Elena Baralis http://dbdmg.polito.it/wordpress/wp-content/uploads/2010/12/bigdata_2015_2x.pdf 1 3 February 2010 Google detected flu outbreak two weeks ahead of
More informationBig Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.
Big Data Programming: an Introduction Spring 2015, X. Zhang Fordham Univ. Outline What the course is about? scope Introduction to big data programming Opportunity and challenge of big data Origin of Hadoop
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationBig Data and Cloud Computing
Big Data and Cloud Computing Presented at Faculty of Computer Science University of Murcia Presenter: Muhammad Fahim, PhD Department of Computer Eng. Istanbul S. Zaim University, Istanbul, Turkey About
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationChapter 6 VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationThe Google File System. Alexandru Costan
1 The Google File System Alexandru Costan Actions on Big Data 2 Storage Analysis Acquisition Handling the data stream Data structured unstructured semi-structured Results Transactions Outline File systems
More informationSTATS Data Analysis using Python. Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns
STATS 700-002 Data Analysis using Python Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns Unit 3: parallel processing and big data The next few lectures will focus on big
More informationA REVIEW PAPER ON BIG DATA ANALYTICS
A REVIEW PAPER ON BIG DATA ANALYTICS Kirti Bhatia 1, Lalit 2 1 HOD, Department of Computer Science, SKITM Bahadurgarh Haryana, India bhatia.kirti.it@gmail.com 2 M Tech 4th sem SKITM Bahadurgarh, Haryana,
More informationIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce Who Am I - Ryan Tabora - Data Developer at Think Big Analytics - Big Data Consulting - Experience working with Hadoop, HBase, Hive, Solr, Cassandra, etc. 2 Who Am I -
More informationMaximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale.
Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale. January 2016 Credit Card Fraud prevention is among the most time-sensitive and high-value of IT tasks. The databases
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationA Single Source of Truth
A Single Source of Truth is it the mythical creature of data management? In the world of data management, a single source of truth is a fully trusted data source the ultimate authority for the particular
More informationTop 25 Big Data Interview Questions And Answers
Top 25 Big Data Interview Questions And Answers By: Neeru Jain - Big Data The era of big data has just begun. With more companies inclined towards big data to run their operations, the demand for talent
More informationHow Insurers are Realising the Promise of Big Data
How Insurers are Realising the Promise of Big Data Jason Hunter, CTO Asia-Pacific, MarkLogic A Big Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies
More information2017 GridGain Systems, Inc. In-Memory Performance Durability of Disk
In-Memory Performance Durability of Disk Meeting the Challenges of Fast Data in Healthcare with In-Memory Technologies Akmal Chaudhri Technology Evangelist GridGain Agenda Introduction Fast Data in Healthcare
More informationdocs.hortonworks.com
docs.hortonworks.com : Getting Started Guide Copyright 2012, 2014 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing,
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationIntroduction to Big Data. Hadoop. Instituto Politécnico de Tomar. Ricardo Campos
Instituto Politécnico de Tomar Introduction to Big Data Hadoop Ricardo Campos Mestrado EI-IC Análise e Processamento de Grandes Volumes de Dados Tomar, Portugal, 2016 Part of the slides used in this presentation
More informationIntroduction to Big Data
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Introduction to Big Data Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini
More informationTypical size of data you deal with on a daily basis
Typical size of data you deal with on a daily basis Processes More than 161 Petabytes of raw data a day https://aci.info/2014/07/12/the-dataexplosion-in-2014-minute-by-minuteinfographic/ On average, 1MB-2MB
More informationHDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1
HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,
More informationA Review Paper on Big data & Hadoop
A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College
More informationChapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationIntroduction to Oracle NoSQL Database
Introduction to Oracle NoSQL Database Anand Chandak Ashutosh Naik Agenda NoSQL Background Oracle NoSQL Database Overview Technical Features & Performance Use Cases 2 Why NoSQL? 1. The four V s of Big Data
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationCloud Analytics and Business Intelligence on AWS
Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse
More informationThe age of Big Data Big Data for Oracle Database Professionals
The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG
More informationrelational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance
WHITEPAPER Relational to Riak relational Introduction This whitepaper looks at why companies choose Riak over a relational database. We focus specifically on availability, scalability, and the / data model.
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2013/14 MapReduce & Hadoop The new world of Big Data (programming model) Overview of this Lecture Module Background Cluster File
More informationBig Data Big Mess? Ein Versuch einer Positionierung
Big Data Big Mess? Ein Versuch einer Positionierung Autor: Daniel Liebhart (Peter Welkenbach) Datum: 10. Oktober 2012 Ort: DBTA Workshop on Big Data, Cloud Data Management and NoSQL BASEL BERN LAUSANNE
More informationProjected by: LUKA CECXLADZE BEQA CHELIDZE Superviser : Nodar Momtsemlidze
Projected by: LUKA CECXLADZE BEQA CHELIDZE Superviser : Nodar Momtsemlidze About HBase HBase is a column-oriented database management system that runs on top of HDFS. It is well suited for sparse data
More informationStrategic Briefing Paper Big Data
Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which
More informationDistributed Computation Models
Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 1. Introduction Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ What is Big Data? buzzword?
More informationStreaming Integration and Intelligence For Automating Time Sensitive Events
Streaming Integration and Intelligence For Automating Time Sensitive Events Ted Fish Director Sales, Midwest ted@striim.com 312-330-4929 Striim Executive Summary Delivering Data for Time Sensitive Processes
More informationBig Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data on AWS Big Data Agility and Performance Delivered in the Cloud 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Technologies and techniques for working productively
More information745: Advanced Database Systems
745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.
More informationThe Mathematics of Big Data
The Mathematics of Big Data Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) Math & Big Data Fall 2017 1 / 10 Introduction We briefly present Big Data and the issues associated with Big Data. Philippe
More informationData Analysis Using MapReduce in Hadoop Environment
Data Analysis Using MapReduce in Hadoop Environment Muhammad Khairul Rijal Muhammad*, Saiful Adli Ismail, Mohd Nazri Kama, Othman Mohd Yusop, Azri Azmi Advanced Informatics School (UTM AIS), Universiti
More informationTowards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis.
Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis Andrey.Sadovykh@softeam.fr www.modeliosoft.com 1 Outlines Introduction Model-driven development Big Data Juniper
More informationAn Emerging Trend of Big data for High Volume and Varieties of Data to Search of Agricultural Data
ORIENTAL JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY An International Open Free Access, Peer Reviewed Research Journal Published By: Oriental Scientific Publishing Co., India. www.computerscijournal.org ISSN:
More informationIn-Memory Computing Brings Operational Intelligence to Business Challenges DR. WILLIAM L. BAIN SCALEOUT SOFTWARE, INC.
In-Memory Computing Brings Operational Intelligence to Business Challenges DR. WILLIAM L. BAIN SCALEOUT SOFTWARE, INC. About the Speaker Dr. William Bain, Founder & CEO of ScaleOut Software: Email: wbain@scaleoutsoftware.com
More informationComputer-based Tracking Protocols: Improving Communication between Databases
Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability
More informationEMC ISILON HARDWARE PLATFORM
EMC ISILON HARDWARE PLATFORM Three flexible product lines that can be combined in a single file system tailored to specific business needs. S-SERIES Purpose-built for highly transactional & IOPSintensive
More information