Introduction to Big Data
|
|
- Evelyn Hunter
- 6 years ago
- Views:
Transcription
1 Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Introduction to Big Data Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini Big Data: fuzzy term, handle with care! Valeria Cardellini - SABD 2016/17 1
2 Source: Why Big Data? Valeria Cardellini - SABD 2016/17 2 How much data? Every day in 2014 we created: 2.5 Exabytes (2.5x x270 Zettabytes)... How big is a Zettabyte? Exabytes (2.5x1018) Petabytes (2500x1015) Terabytes ( x1012) bytes! 90% of all the data in the world has been generated over the last two years (in 2013) 40 Zettabytes of data will be created by 2020 Source: Valeria Cardellini - SABD 2016/17 3
3 How much data? Some older statistics: Google: processes more than 20 PB a day (2008) Facebook: has 2.5 PB of user data (2009) ebay: has more 6.5 PB of user data + 50 TB/day (5/2009) CERN s LHC: generates 1 PB of data per second (2013) Valeria Cardellini - SABD 2016/17 4 Big data driving factors Big Data is growing fast Mobile devices Internet of Things Valeria Cardellini - SABD 2016/17 5
4 Smart world Valeria Cardellini - SABD 2016/17 6 Internet of Things (IoT) Valeria Cardellini - SABD 2016/17 7
5 How Big? IoT impact Internet of Things (IoT) will largely contribute to increase Big Data challenges Prolifera;on of data sources Valeria Cardellini - SABD 2016/17 8 Big Data definitions Different definitions Big data exceeds the reach of commonly used hardware environments and software tools to capture, manage, and process it with in a tolerable elapsed time for its user population. Teradata Magazine article, 2011 Big data refers to data sets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze. The McKinsey Global Institute, 2012 Big data is mostly about taking numbers and using those numbers to make predictions about the future. The bigger the data set you have, the more accurate the predictions about the future will be. Anthony Goldbloom, Kaggle s founder Big data is a term for data sets that are so large or complex that traditional data processing application softwares are inadequate to deal with them. Wikipedia, 2017 Valeria Cardellini - SABD 2016/17 9
6 so, what is Big Data? Big Data is similar to Small data, but bigger but having data bigger it requires different approaches (scale changes everything!) New methodologies, tools, architectures with an aim to solve new problems or old problems in a better way Valeria Cardellini - SABD 2016/17 10 Gartner s Big data definition The most-frequently used and perhaps, somewhat abused definition (revised version by Gartner, 2012) Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. Valeria Cardellini - SABD 2016/17 11
7 3V model for Big Data 1. Volume: data size challenging to store and process (how to index, retrieve) 2. Variety: data heterogeneity because of different data types (text, audio, video, record) and degree of structure (structured, semistructured, unstructured data) 3. Velocity: data generation rate and analysis rate Defined in 2001 by D. Laney Valeria Cardellini - SABD 2016/17 12 The extended (3+n)V model 4. Value: Big data can generate huge competitive advantages Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis. (IDC, 2011) The bigger the data set you have, the more accurate the predictions about the future will be (A. Goldbloom) 5. Veracity: uncertainty of accuracy and authenticity of data 6. Variability: data flows can be highly inconsistent with periodic peaks 7. Visualization Valeria Cardellini - SABD 2016/17 13
8 Big Data visualization Presenta;on of data in a pictorial and graphical format Mo;va;on: our brain processes images 60,000x faster than text Some examples Flight paierns Hurricanes uxblog.idvsolu;ons.com/2012/08/hurricanes-since-1851.html Rela;onships between actors who have won Oscars, the directors they have worked with and all the other actors they have worked with Valeria Cardellini - SABD 2016/17 14 Big Data on Google Trend Searched terms on Google Trend Big Data Hadoop Data mining Business analytics Database management system Valeria Cardellini - SABD 2016/17 15
9 Gartner s 2015 hype cycle for advanced analytics and data science Valeria Cardellini - SABD 2016/17 16 Big Data potential value Source: McKinsey Global Institute, 2011 Valeria Cardellini - SABD 2016/17 17
10 Why now? Because we have data Data born already in digital form 40% of data growth per year Because we can 400$ for a drive in which to store all the music of the world More than 40 years of Moore's law: we have large computational resources 76% of organizations have invested in Big Data in billion $ invested in Big Data in 2016 Valeria Cardellini - SABD 2016/17 18 Some examples of Big Data applications Consumer product companies and retail organizations monitor social media like Facebook and Twitter to get an unprecedented view into customer behavior, preferences, and product perception Manufacturers monitor minute vibration data from their equipment to predict the optimal time to replace or maintain Manufacturers also monitor social networks, but with a different goal than marketers: to detect aftermarket support issues before a warranty failure becomes publicly detrimental Governments make data public for users to develop new applications that can generate public good (Open Data initiative) Valeria Cardellini - SABD 2016/17 19
11 many other Big Data applications in very diverse sectors Crime prevention in Los Angeles Diagnosis and treatment of genetic diseases Investments in financial sector Generation of personalized advertising Astronomical discoveries See the BBC video Valeria Cardellini - SABD 2016/17 20 Examples of real-time analytics Real-time analytics over high volume sensor data: analysis of energy consumption measurements (DEBS 2014 Grand Challenge) Real-time analytics over high volume geospatial data streams: analysis of taxi trips based on a stream of trip reports from New York City (DEBS 2015 Grand Challenge) Real-time analytics metrics for a dynamic (evolving) socialnetwork graph: identification of the posts that currently trigger the most activity in the social network, and identification of large communities that are currently involved in a topic (DEBS 2016 Grand Challenge) Valeria Cardellini - SABD 2016/17 21
12 Examples of real-time analytics Finance: real-time forecasting of stock market Medicine: epidemy tracking govdatadownload.com/2015/01/27/big-data-enables-epidemic-tracking/ Security Fraud detection, DDOS attacks, behavioural pattern recognition Urban traffic management [Art14] Valeria Cardellini - SABD 2016/17 22 The Big Data process Valeria Cardellini - SABD 2016/17 23
13 The Big Data process Acquisition Requires: Selecting data Filtering data Generating metadata Managing data provenance Valeria Cardellini - SABD 2016/17 24 The Big Data process Extraction Requires: Transformation Normalization E.g., avoid duplication Cleaning Detect and correct (or remove) corrupt or inaccurate data Aggregation Error handling Valeria Cardellini - SABD 2016/17 25
14 The Big Data process Integration Requires: Standardization Conflict management Reconciliation Mapping definition Valeria Cardellini - SABD 2016/17 26 The Big Data process Analysis Requires: Exploration Data mining Machine learning Visualization Valeria Cardellini - SABD 2016/17 27
15 The Big Data process Interpretation Requires: Knowledge of the domain Knowledge of the provenance Identification of patterns of interest Flexibility of the process Valeria Cardellini - SABD 2016/17 28 The Big Data process Decision Requires: Managerial skills Continuous improvement of the process Valeria Cardellini - SABD 2016/17 29
16 Risks and challenges of Big Data Performance Data grows faster than energy on chip Efficiency Scalability and elasticity Goal: to scale linearly as workloads and data volumes grow Fault tolerance Effectiveness Heterogeneity Regarding data, processing environment, Flexibility Privacy Costs Valeria Cardellini - SABD 2016/17 30 Effectiveness of Big data analysis A famous example of inaccurate analysis Google Flu Trends predictions Sometimes very inaccurate: over the interval , when it consistently overestimated flu prevalence and over one interval in the flu season predicted twice as many doctors' visits as those recorded Lazer et al., "The Parable of Google Flu: Traps in Big Data Analysis". Science. 343 (6176): doi: /science Valeria Cardellini - SABD 2016/17 31
17 Taming performance: distribution and replication Distributed architecture The common architectural solution for Big Data processing: cluster of commodity hardware resources that work together for a common goal Scale out (or horizontally), not up (or vertically)! But elastic scale support is still a challenge Distributed processing Shared-nothing model New programming paradigms Resource replication The well-known solution to achieve fault tolerance Eventual consistency (CAP theorem!) Valeria Cardellini - SABD 2016/17 32 Shared nothing vs. other parallel architectures D. DeWitt and J. Gray, Parallel database systems: the future of high performance database systems, ACM Communications, 1992 Valeria Cardellini - SABD 2016/17 33
18 Big Data platforms Acquire, manage, process At large scales To meet QoS application requirements Valeria Cardellini - SABD 2016/17 34 Our Big Data stack High-level Interfaces Data Processing Data Storage Resource Management Support / Integration Valeria Cardellini - SABD 2016/17 35
19 The Big Data stack: BDAS BDAS: the Berkeley Data Analytics Stack Valeria Cardellini - SABD 2016/17 36 The Big Data stack: Cloudera platform Valeria Cardellini - SABD 2016/17 37
20 Data analysis paradigm shift From the old way: Structure -> Ingest -> Analyze Also known as ETL (Extract, Transform, and Load) Extract data from data sources Transform data for storing in the proper format or structure for the purposes of querying and analysis Load data into the target final system, i.e., database operational data store, data mart, or data warehouse (DWH) Valeria Cardellini - SABD 2016/17 38 Data analysis paradigm shift to the new way: Ingest -> Analyze -> Structure Also known as ELT (Extract, Load, and Transform) Extract data from the sources Load data into a data lake Transform data Advantages: No need for a separate transformation engine Data transformation and loading happen in parallel More effective Valeria Cardellini - SABD 2016/17 39
21 Data lake A data lake is a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms, usually object blobs or files. (Wikipedia) Valeria Cardellini - SABD 2016/17 40 Some techniques for Big Data analytics Data mining: anomaly detection, association rule learning, classification, clustering, regression, summarization Machine learning: supervised learning, unsupervised learning, reinforcement learning Crowdsourcing Outsourcing human-intelligence tasks to a large group of unspecified people via Internet We do not cover them in this course Valeria Cardellini - SABD 2016/17 41
Based on Big Data: Hype or Hallelujah? by Elena Baralis
Based on Big Data: Hype or Hallelujah? by Elena Baralis http://dbdmg.polito.it/wordpress/wp-content/uploads/2010/12/bigdata_2015_2x.pdf 1 3 February 2010 Google detected flu outbreak two weeks ahead of
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference
More informationBIG DATA TESTING: A UNIFIED VIEW
http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation
More informationNewSQL Databases. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NewSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationNowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?
Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/
More informationFog Computing. The scenario
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Fog Computing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The scenario
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationSearch and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria
More informationChallenges in Data Stream Processing
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria
More informationSearch Engines and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18
More informationMicrosoft Exam
Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred
More informationI am a Data Nerd and so are YOU!
I am a Data Nerd and so are YOU! Not This Type of Nerd Data Nerd Coffee Talk We saw Cloudera as the lone open source champion of Hadoop and the EMC/Greenplum/MapR initiative as a more closed and
More informationData Management Glossary
Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative
More informationFrom Internet Data Centers to Data Centers in the Cloud
From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs
More informationCOMP 465 Special Topics: Data Mining
COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More information5 Fundamental Strategies for Building a Data-centered Data Center
5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More information<Insert Picture Here> Introduction to Big Data Technology
Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into
More informationChallenges and Opportunities with Big Data. By: Rohit Ranjan
Challenges and Opportunities with Big Data By: Rohit Ranjan Introduction What is Big Data? Big data is data sets that are so voluminous and complex that traditional data processing application software
More informationBig Data - Some Words BIG DATA 8/31/2017. Introduction
BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1 What is Big Data Big Data means
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationMATH36032 Problem Solving by Computer. Data Science
MATH36032 Problem Solving by Computer Data Science NO. of jobs on jobsite 1 10000 NO. of Jobs 8000 6000 4000 2000 MATLAB Data Data Science 0 Jan 2016 Jul 2016 Jan 2017 1 http://www.jobsite.co.uk/ What
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationSolution Brief. Bridging the Infrastructure Gap for Unstructured Data with Object Storage. 89 Fifth Avenue, 7th Floor. New York, NY 10003
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 Solution Brief Bridging the Infrastructure Gap for Unstructured Data with Object Storage Printed in the United
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationExamples of Big Data analytics in ENEA: data sources and information extraction strategies
Examples of Big Data analytics in ENEA: data sources and information extraction strategies Ing. Giovanni Ponti, PhD ENEA DTE-ICT-HPC giovanni.ponti@enea.it DISRUPTIVE DATA 2017 5 Maggio, 2017, Via Santa
More informationWhat is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?
Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation
More informationMicrosoft Developer Day
Microsoft Developer Day Pradeep Menon Microsoft Developer Day Solutions Architect Agenda Microsoft Developer Day Traditional Business Intelligence Architecture Structured Sources Extract Transform Structurize
More informationKafka Streams: Hands-on Session A.A. 2017/18
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Kafka Streams: Hands-on Session A.A. 2017/18 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica
More informationATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V
ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2016 A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationIntroduction to Data Intensive Computing
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Introduction to Data Intensive Computing Corso di Sistemi Distribuiti e Cloud Computing A.A. 2017/18
More informationCSE6331: Cloud Computing
CSE6331: Cloud Computing Leonidas Fegaras University of Texas at Arlington c 2019 by Leonidas Fegaras Cloud Computing Fundamentals Based on: J. Freire s class notes on Big Data http://vgc.poly.edu/~juliana/courses/bigdata2016/
More informationBig Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition
Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition What s the BIG deal?! 2011 2011 2008 2010 2012 What s the BIG deal?! (Gartner Hype Cycle) What s the
More informationSome Big Data Challenges
Some Big Data Challenges 2,500,000,000,000,000,000 Bytes (2.5 x 10 18 ) of data are created every day! (2012) or 8,000,000,000,000,000,000 (8 exabytes) of new data were stored globally by enterprises in
More informationConvergence and Collaboration: Transforming Business Process and Workflows
Convergence and Collaboration: Transforming Business Process and Workflows Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Convergence & Collaboration:
More informationData Analytics at Logitech Snowflake + Tableau = #Winning
Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationAn Introductionto Big Data
Data Management for Data Science Corso di laurea magistrale in Data Science Sapienza Università di Roma 2016/2017 An Introductionto Big Data Domenico Lembo Dipartimento di Ingegneria Informatica Automatica
More informationThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin,
More informationBig Data The end of Data Warehousing?
Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort
More informationA REVIEW PAPER ON BIG DATA ANALYTICS
A REVIEW PAPER ON BIG DATA ANALYTICS Kirti Bhatia 1, Lalit 2 1 HOD, Department of Computer Science, SKITM Bahadurgarh Haryana, India bhatia.kirti.it@gmail.com 2 M Tech 4th sem SKITM Bahadurgarh, Haryana,
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationA Review Approach for Big Data and Hadoop Technology
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 A Review Approach for Big Data and Hadoop Technology Prof. Ghanshyam Dhomse
More informationApache Storm: Hands-on Session A.A. 2016/17
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Apache Storm: Hands-on Session A.A. 2016/17 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica
More informationREVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK
REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK 1 Dr.R.Kousalya, 2 T.Sindhupriya 1 Research Supervisor, Professor & Head, Department of Computer Applications, Dr.N.G.P Arts and Science College, Coimbatore
More informationApproaching the Petabyte Analytic Database: What I learned
Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may
More informationIntroduction to the Mathematics of Big Data. Philippe B. Laval
Introduction to the Mathematics of Big Data Philippe B. Laval Fall 2017 Introduction In recent years, Big Data has become more than just a buzz word. Every major field of science, engineering, business,
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationINTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING
CS 7265 BIG DATA ANALYTICS INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, PhD Computer Science,
More informationHandout 12 Data Warehousing and Analytics.
Handout 12 CS-605 Spring 17 Page 1 of 6 Handout 12 Data Warehousing and Analytics. Operational (aka transactional) system a system that is used to run a business in real time, based on current data; also
More informationREGULATORY REPORTING FOR FINANCIAL SERVICES
REGULATORY REPORTING FOR FINANCIAL SERVICES Gordon Hughes, Global Sales Director, Intel Corporation Sinan Baskan, Solutions Director, Financial Services, MarkLogic Corporation Many regulators and regulations
More informationData Mining: Approach Towards The Accuracy Using Teradata!
Data Mining: Approach Towards The Accuracy Using Teradata! Shubhangi Pharande Department of MCA NBNSSOCS,Sinhgad Institute Simantini Nalawade Department of MCA NBNSSOCS,Sinhgad Institute Ajay Nalawade
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data
More informationD B M G Data Base and Data Mining Group of Politecnico di Torino
DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results
More informationBig Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela
More informationLOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS
LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS Vandita Jain 1, Prof. Tripti Saxena 2, Dr. Vineet Richhariya 3 1 M.Tech(CSE)*,LNCT, Bhopal(M.P.)(India) 2 Prof. Dept. of CSE, LNCT, Bhopal(M.P.)(India)
More informationDatabase Management Systems
Database Management Systems Fall 2017 Knowledge is of two kinds: we know a subject ourselves, or we know where we can find information upon it. -- Samuel Johnson (1709-1784) Queries for Today Why? Who?
More information745: Advanced Database Systems
745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 24, 2015 Course Information Website: www.stat.ucdavis.edu/~chohsieh/ecs289g_scalableml.html My office: Mathematical Sciences Building (MSB)
More informationOracle NoSQL Database Overview Marie-Anne Neimat, VP Development
Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development June14, 2012 1 Copyright 2012, Oracle and/or its affiliates. All rights Agenda Big Data Overview Oracle NoSQL Database Architecture Technical
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationDrawing the Big Picture
Drawing the Big Picture Multi-Platform Data Architectures, Queries, and Analytics Philip Russom TDWI Research Director for Data Management August 26, 2015 Sponsor 2 Speakers Philip Russom TDWI Research
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationAnswer: A Reference:http://www.vertica.com/wpcontent/uploads/2012/05/MicroStrategy_Vertica_12.p df(page 1, first para)
1 HP - HP2-N44 Selling HP Vertical Big Data Solutions QUESTION: 1 When is Vertica a better choice than SAP HANA? A. The customer wants a closed ecosystem for BI and analytics, and is unconcerned with support
More informationModern Database Concepts
Modern Database Concepts Introduction to the world of Big Data Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz What is Big Data? buzzword? bubble? gold rush? revolution? Big data is like teenage
More informationHeisenberg and the uncertainty laws of BI. Zoltan Vago, Senior DWH Consultant 03-June-2015
Heisenberg and the uncertainty laws of BI Zoltan Vago, Senior DWH Consultant zoltan.vago@teradata.com 03-June-2015 The uncerainty principle The more precisely the position of some particle is determined,
More informationChapter 3. Foundations of Business Intelligence: Databases and Information Management
Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional
More informationSpatial Analytics Built for Big Data Platforms
Spatial Analytics Built for Big Platforms Roberto Infante Software Development Manager, Spatial and Graph 1 Copyright 2011, Oracle and/or its affiliates. All rights Global Digital Growth The Internet of
More informationCopyright 2013 EMC Corporation. All rights reserved. BIG DATA AND SECURITY JOINING FORCES
1 BIG DATA AND SECURITY JOINING FORCES 2 Agenda Security for Big Data Big Data for Security Conclusions Structured + Unstructured Data = Big Telemetry, Location-Based, etc. Structured in Relational Databases
More informationOnline Bill Processing System for Public Sectors in Big Data
IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 10 March 2018 ISSN (online): 2349-6010 Online Bill Processing System for Public Sectors in Big Data H. Anwer
More informationBig Data: Information, Data, Events, Analytics at Scale
Big Data: Information, Data, Events, Analytics at Scale Prof Peter Triantafillou Chair of Data Systems Associate Director UBDC IDEAS Research Group School of Computing Science University of Glasgow http://dcs.gla.ac.uk/ideas/
More informationGovernment Needs in Big Data Analytics Irina Vayndiner, Ken Smith, Peter Mork
Government Needs in Big Data Analytics Irina Vayndiner, Ken Smith, Peter Mork Government Big Data Challenges Data volumes are growing fast Need to ingest larger and larger amounts of data and to perform
More informationLand Administration and Management: Big Data, Fast Data, Semantics, Graph Databases, Security, Collaboration, Open Source, Shareable Information
Land Administration and Management: Big Data, Fast Data, Semantics, Graph Databases, Security, Collaboration, Open Source, Shareable Information Platform Steven Hagan, Vice President, Engineering 1 Copyright
More informationActive Archive and the State of the Industry
Active Archive and the State of the Industry Taking Data Archiving to the Next Level Abstract This report describes the state of the active archive market. New Applications Fuel Digital Archive Market
More informationIntroduction to Data Science
UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics
More informationProcessing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.
Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most
More informationBig Data A Growing Technology
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,
More informationMassively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data
Big Data Really Fast A Proven In-Memory Analytical Processing Platform for Big Data 2 Executive Summary / Overview: Big Data can be a big headache for organizations that have outgrown the practicality
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationBig Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration
Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration WHITE PAPER / JANUARY 25, 2019 Table of Contents Introduction... 3 Harnessing the power of big data beyond the SQL world...
More informationApplying big data analytics in practice
ARISTOTLE UNIVERSITY of THESSALONIKI Applying big data analytics in practice Anastasios Gounaris School of Informatics datalab.csd.auth.gr/~gounaris email: gounaria@csd.auth.gr New data every 1 min 2 What
More informationTop Five Reasons for Data Warehouse Modernization Philip Russom
Top Five Reasons for Data Warehouse Modernization Philip Russom TDWI Research Director for Data Management May 28, 2014 Sponsor Speakers Philip Russom TDWI Research Director, Data Management Steve Sarsfield
More informationData Mining and Warehousing
Data Mining and Warehousing Sangeetha K V I st MCA Adhiyamaan College of Engineering, Hosur-635109. E-mail:veerasangee1989@gmail.com Rajeshwari P I st MCA Adhiyamaan College of Engineering, Hosur-635109.
More informationINSPIRING IOT INNOVATION: MARKET EVOLUTION TO REMOVE BARRIERS. Mark Chen Taiwan Country Manager, Senior Director, Sales of Broadcom
INSPIRING IOT INNOVATION: MARKET EVOLUTION TO REMOVE BARRIERS Mark Chen Taiwan Country Manager, Senior Director, Sales of Broadcom CAUTIONARY STATEMENT This presentation may contain forward-looking statements
More informationBIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE
BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest
More informationStrategic Briefing Paper Big Data
Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which
More informationBusiness Intelligence & Predictive Analytics In Big Data For Big Insights
Business Intelligence & Predictive Analytics In Big Data For Big Insights Sanjeev D. Chaube Chief Data Scientist & Global Practice Head (Big Data Analytics) a Leading IT Company, Lunkad Abode, Viman Nagar,
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationWhat role will telcos play? - Monetize data or serve as trusted third parties?
What role will telcos play? - Monetize data or serve as trusted third parties? Give Customers the Control of their Digital Lives Digiworld Summit 2016 Plenary Session Day 2 Trust as an opportunity - Players
More informationPMINJ Chapter 01 May Symposium Cloud Computing and Big Data. What s the Big Deal?
PMINJ Chapter 01 May Symposium 2017 Cloud Computing and Big Data. What s the Big Deal? Arlene Minkiewicz, Chief Scientist PRICE Systems, LLC arlene.minkiewicz@pricesystems.com Agenda Introduction Cloud
More informationCombine Native SQL Flexibility with SAP HANA Platform Performance and Tools
SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been
More informationNoam Ikar R&DVP. Complex Event Processing and Situational Awareness in the Digital Age
Noam Ikar R&DVP Complex Event Processing and Situational Awareness in the Digital Age We need to correlate events from inside and outside the organization by a smart layer Cyberint CEO, Dec 2017. Wikipedia
More informationRISC-V: Enabling a New Era of Open Data-Centric Computing Architectures
Presentation Brief RISC-V: Enabling a New Era of Open Data-Centric Computing Architectures Delivers Independent Resource Scaling, Open Source, and Modular Chip Design for Big Data and Fast Data Environments
More informationThe age of Big Data Big Data for Oracle Database Professionals
The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG
More informationBig Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012
Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema
More information