Big Data - Some Words BIG DATA 8/31/2017. Introduction

Similar documents
Big data. Professor Dan Ariely, Duke University.

Based on Big Data: Hype or Hallelujah? by Elena Baralis

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

I am a Data Nerd and so are YOU!

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391

Modern Database Concepts

Unit 10 Databases. Computer Concepts Unit Contents. 10 Operational and Analytical Databases. 10 Section A: Database Basics

Introduction to the Mathematics of Big Data. Philippe B. Laval

International Journal of Computer Trends and Technology (IJCTT) Volume 38 Number 1 - August 2016

From Internet Data Centers to Data Centers in the Cloud

Renovating your storage infrastructure for Cloud era

27/04/2015 CC PROCESAMIENTO MASIVO DE DATOS OTOÑO Lecture 1: Introduction THE VALUE OF DATA. Aidan Hogan

<Insert Picture Here> Introduction to Big Data Technology

I D C C O U N T R Y B R I E F

BIG DATA TESTING: A UNIFIED VIEW

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Deep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

3 Data, Data Mining. Chengkai Li

CANARIE: Providing Essential Digital Infrastructure for Canada

Nielsen List of Top 10 ios Mobile Apps

Embedded Technosolutions

A Survey on Comparative Analysis of Big Data Tools

Massive Scalability With InterSystems IRIS Data Platform

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

Introduction to Data Mining and Data Analytics

Big Data The end of Data Warehousing?

INDIA DIGITAL STATSHOT KEY STATISTICAL INDICATORS FOR INTERNET, MOBILE, AND SOCIAL MEDIA USAGE IN INDIA IN AUGUST 2015 SIMON KEMP WE ARE SOCIAL

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Finding a needle in Haystack: Facebook's photo storage

Improving Network Infrastructure to Enable Large Scale Scientific Data Flows and Collaboration (Award # ) Klara Jelinkova Joseph Ghobrial

REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

High Performance Computing on MapReduce Programming Framework

Accelerate your SAS analytics to take the gold

Consumer Opinions and Habits A XIRRUS STUDY

Optimized Data Integration for the MSO Market

Investing in a Better Storage Environment:

CO-OP Mobile: Mobile App for ipads. April 18, 2013

When, Where & Why to Use NoSQL?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

The Amazing Internet!

CIO Forum Maximize the value of IT in today s economy

Strategic Briefing Paper Big Data

A Review Paper on Big data & Hadoop

An overview of. Mobile Testing. By André Jacobs. A Jacobs

745: Advanced Database Systems

DIGITALGLOBE ENHANCES PRODUCTIVITY

An Introductionto Big Data

DDN Annual High Performance Computing Trends Survey Reveals Rising Deployment of Flash Tiers & Private/Hybrid Clouds vs.

Indistinguishable from magic

Google GSuite Intro Demo of GSuite and GCP integration

A quick guide to. Getting Started

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

FACTS & FIGURES FEBRUARY 2014

Computing Yi Fang, PhD

Massive Online Analysis - Storm,Spark

Digital Layer Trends PostalVision 2020

Active Archive and the State of the Industry

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Introduction to Data Management CSE 344

Copyright 2010 EMC Corporation. All rights reserved. CLOUD MEETS BIG DATA. Sujal Patel President, Isilon Storage Division EMC Corporation

Introduction to Data Science

Big Data For Oil & Gas

Xactware User Conference 2011

Checklist. ORB Education Quality Teaching Resources. ORB Education Visit for the full, editable versions.

!!!!!! Digital Foundations

Canadian ecommerce Monthly Trends Report

Organizing Data The Power of Structure...

BroadGroup is an Information Media Technology and Professional Services company.

Crazy YouTube Stats. Seminar Topics. sales. According to Nielsen, YouTube reaches more US adults. YouTube is available on 350 million devices

ONLINE EVALUATION FOR: Company Name

Oracle #1 RDBMS Vendor

Cognitive-based Computation, Semantic Understanding, and Web Wisdom

How To Guide. ADENION GmbH Merkatorstraße Grevenbroich Germany Fon: Fax:

Marketing & Back Office Management


What the is SEO? And how you can kick booty in the interwebs game

Spatial Analytics Built for Big Data Platforms

Understanding the SAP HANA Difference. Amit Satoor, SAP Data Management

Big Data Specialized Studies

Data Intensive Science Impact on Networks

Social Bookmarks. Blasting their site with them during the first month of creation Only sending them directly to their site

A REVIEW PAPER ON BIG DATA ANALYTICS

QLogic/Lenovo 16Gb Gen 5 Fibre Channel for Database and Business Analytics

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Part I What are Databases?

CSE6331: Cloud Computing

360 View on M-Commerce. Presented by S. Baranikumar

OUR TOP DATA SOURCES AND WHY THEY MATTER

The Smartphone Consumer June 2012

Lecture 25 Overview. Last Lecture Query optimisation/query execution strategies

The Mathematics of Big Data

Microsoft Developer Day

CS 6240: Parallel Data Processing in MapReduce: Module 1. Mirek Riedewald

Overview of Web Mining Techniques and its Application towards Web

Internet of Things (IOT) What It Is and How It Will Impact State Pools

Transcription:

BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1

What is Big Data Big Data means different things to people with different backgrounds and interests. Traditionally? "Big Data" = massive volumes of data For example: CERN data volume, NASA, Google,... Where does the Big Data come from? All over! Web logs, RFID, GPS systems, sensor networks, social networks, text documents on the Internet, Internet search, index cards, call detail, astronomy, atmospheric science, biology, genomics, nuclear physics, biochemical experiments, records, medical research, military surveillance, archives, multimedia, etc. What is Big Data Records of each step of modern life on social networks and the sharing of information between people and businesses have changed the general culture of humanity and created an environment conducive to a wave of innovations like never before. The register about all activities, data, behaviours are creating a new way how people and companies are interacting. The amount of data that YOU generate is amazing and is rich of information. The move from analog yielded the digital age an era when people enabled with smart phones and sensors began uploading troves of searchable digital content. While data used to stack up in fairly linear fashion, digital content is now created by consumers and is multiplying at rates previously unheard of. The volume of data generated is duplicated every 2 years, soon, it will be in 18 months. 2

Big Data Digital Data Volume Big Data Bytes Chart 3

Big Data Bytes Size 0.5 ZB All internet data until 2009 1 ZB = 75 Millions of ipads Air (16Gb) which if stacked would give 1.5 times at a distance between Earth and moon. 42 ZB All words said by the humanity during the whole history, if it could be digitized Big Data Data Creation Data Creation does not slowing down Hadron Collider (the world's largest and most powerful particle accelerator) - 1 PB/sec Boeing jet - 20 TB/hr Facebook - 500 TB/day. YouTube 1 TB/4 min. The proposed Square Kilometer Array telescope (the world s proposed biggest telescope) 1 EB/day 4

Big Data - Numbers Facebook Worldwide, there are over 2.01 billion monthly active Facebook users for June 2017 which is a 17 percent increase year over year. There are 1.15 billion mobile daily active users for December 2016, an increase of 23 percent year-over-year. On average, the Like and Share Buttons are viewed across almost 10 million websites daily. Five new profiles are created every second. There are 83 million fake profiles. Photo uploads total 300 million per day. Every 60 seconds on Facebook: 510,000 comments are posted, 293,000 statuses are updated, and 136,000 photos are uploaded. One in five page views in the United States occurs on Facebook. 16 Million local business pages have been created as of May 2013 which is a 100 percent increase from 8 million in June 2012. https://zephoria.com/top-15-valuable-facebook-statistics/ Big Data - Numbers Google estimates that every two days about 5 exabytes of information is generated - this is what humanity has generated throughout its history up to 2003. Twitter Total Number of Monthly Active Twitter Users: 328 million Total Number of Tweets sent per Day: 500 million Walmart The world s biggest retailer with over 20,000 stores in 28 countries, is in the process of building the world biggest private cloud, to process 2.5 petabytes of data every hour. 5

Big Data - Numbers Emails The estimate number of email users worldwide is 3.7 billion, and the amount of emails sent per day (in 2017) to be around 269 billion. First email system: 1971 Average office worker receives 121 emails a day Percentage of email that is spam: 49.7% Big Data - Definition There are several definitions of Big Data from leading authors in the market. The McKisney Global Institute defines Big Data as "the intense use of online social networks, mobile devices for Internet connection, transactions and digital content, as well as the increasing use of cloud computing, which has generated untold amounts of data. The term 'Big Data' refers to this data set whose growth is exponential and whose dimension is beyond the capabilities of the typical tools to capture, manage and analyze data. " Gartner defines Big Data as "the term adopted by the market to describe problems in managing and processing extreme information that exceed the capacity of traditional information technologies over one or several dimensions. Big Data is focused primarily on extremely large dataset volume issues generated from technological practices such as social media, operating technologies, Internet access, and distributed information sources. Big Data is essentially a practice that introduces new business opportunities. 6

Big Data - 3 V s or more Big Data is characterized by the three V's: Volume Variety Velocity Besides these dimensions there are others V s used by some very pertinent authors: Veracity (IBM) Variability (SAS) Value Big Data - Volume Volume is the most common trait of Big Data. Many factors contributed to the exponential increase in data volume, such as transaction-based fata storage through the years, text data constantly streaming form social media, increasing amount of sensor data being collected, automatically generated RFID and GPS data, and so on. In the past, excessive data volume created storage issues, both technical and financial. Today advanced technologies coupled with decreasing storage costs. Represents the increase in the amount of data we have. 7

Big Data - Volume Big Data - Variety Data today comes in all types of formats Database, xml files, text files, images, videos, sensor captures, emails, 85 % of all organizations data is in some sort of unstructured or semi structured format (a format that is not suitable for traditional databases schemas). 8

Big Data - Variety Big Data Velocity Velocity mean how fast data is being produced and how fast the data must be processed. Reacting quickly enough to deal with velocity is a challenge to most organizations. Time sensitive environment. 9

Big Data - Velocity Others V s Veracity : It refers o conformity to facts: Accuracy, quality, truthfulness, or trustworthiness. Variability : Inconsistence of the data flow linked with events or periodic peaks. Value : By analyzing large and feature-rich data, organizations can gain greater business value. Big data means Big analytics. Big analytics means greater insight and better decisions, something that every organization needs. 10

Big Data - Veracity Big Data - Value 11

The worst place to park in New York City using big data https://www.youtube.com/watch?v=lz_kidxbzga Structure Data The term structured data generally refers to data that has a defined length and format. Examples of structured data include numbers, dates, and groups of words and numbers called strings (for example, a customer s name, address, and so on). Structured data is the data that you re probably used to dealing with. It s usually stored in a database. You can query it using a language like structured query language (SQL). Traditional Sources includes Customer Relationship Management (CRM) data, operational Enterprise Resource Planning (ERP) data, and financial data. 12

Sources of structured data Computer- or machine-generated: Machine-generated data generally refers to data that is created by a machine without human intervention. Human-generated: This is data that humans, in interaction with computers, supply. Sensor data: Examples include radio frequency ID (RFID) tags, smart meters, medical devices, and Global Positioning System (GPS) data. For example, RFID is rapidly becoming a popular technology. It uses tiny computer chips to track items at a distance. Web log data: When servers, applications, networks, and so on operate, they capture all kinds of data about their activity. This can amount to huge volumes of data that can be useful, for example, to deal with service-level agreements or to predict security breaches. Point-of-sale data: When the cashier swipes the bar code of any product that you are purchasing, all that data associated with the product is generated. Just think of all the products across all the people who purchase them, and you can understand how big this data set can be. Sources of structured data Financial data: Lots of financial systems are now programmatic; they are operated based on predefined rules that automate processes. Stocktrading data is a good example of this. It contains structured data such as the company symbol and dollar value. Some of this data is machine generated, and some is human generated. Input data: This is any piece of data that a human might input into a computer, such as name, age, income, non-free-form survey responses, and so on. This data can be useful to understand basic customer behavior. Click-stream data: Data is generated every time you click a link on a website. This data can be analyzed to determine customer behavior and buying patterns. Gaming-related data: Every move you make in a game can be recorded. This can be useful in understanding how end users move through a gaming portfolio. 13

Unstructured Data Unstructured data is data that does not follow a specified format. If 15 % of the data available to enterprises is structured data, the other 85 % is unstructured. Unstructured data is really most of the data that you will encounter. Until recently, however, the technology didn t really support doing much with it except storing it or analyzing it manually. Unstructured data is everywhere. In fact, most individuals and organizations conduct their lives around unstructured data. Just as with structured data, unstructured data is either machine generated or human generated. Sources of unstructured data Satellite images: This includes weather data or the data that the government captures in its satellite surveillance imagery. Just think about Google Earth, and you get the picture (pun intended). Scientific data: This includes seismic imagery, atmospheric data, and high energy physics. Photographs and video: This includes security, surveillance, and traffic video. Radar or sonar data: This includes vehicular, meteorological, and oceanographic seismic profiles. Text internal to your company: Think of all the text within documents, logs, survey results, and e-mails. Enterprise information actually represents a large percent of the text information in the world today. Social media data: This data is generated from the social media platforms such as YouTube, Facebook, Twitter, LinkedIn, and Flickr. Mobile data: This includes data such as text messages and locationinformation. Website content: This comes from any site delivering unstructured content, like YouTube, Flickr, or Instagram. 14

Structured Vs Unstructured Data Blockchain https://www.youtube.com/watch?v=pl8olkkwrpc 15