Hierarchy of knowledge BIG DATA 9/7/2017. Architecture

Similar documents
Embedded Technosolutions

DATACENTER SERVICES DATACENTER

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

An Introduction to Big Data Formats

Big Data with Hadoop Ecosystem

Real-Time Insights from the Source

New Approaches to Big Data Processing and Analytics

Massive Scalability With InterSystems IRIS Data Platform

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM

Enhancing Security With SQL Server How to balance the risks and rewards of using big data

Red Hat's vision on Telco/NFV/IOT

BIG DATA TESTING: A UNIFIED VIEW

Big Data Integration BIG DATA 9/15/2017. Business Performance

Introduction to the Active Everywhere Database

Strategic Briefing Paper Big Data

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

5 Fundamental Strategies for Building a Data-centered Data Center

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Thailand Digital Government Development Plan Digital Government Development Agency (Public Organization) (DGA)

Active Archive and the State of the Industry

Intro Cassandra. Adelaide Big Data Meetup.

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Cloud Computing: Making the Right Choice for Your Organization

When, Where & Why to Use NoSQL?

Microsoft Big Data and Hadoop

Hortonworks and The Internet of Things

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Stages of Data Processing

The Hadoop Paradigm & the Need for Dataset Management

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data The end of Data Warehousing?

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Solution Brief. Bridging the Infrastructure Gap for Unstructured Data with Object Storage. 89 Fifth Avenue, 7th Floor. New York, NY 10003

REALIZE YOUR. DIGITAL VISION with Digital Private Cloud from Atos and VMware

Evolution For Enterprises In A Cloud World

Making hybrid IT simple with Capgemini and Microsoft Azure Stack

Availability in the Modern Datacenter

Oracle Big Data Connectors

Data Analytics at Logitech Snowflake + Tableau = #Winning

IT Consulting and Implementation Services

INFS 214: Introduction to Computing

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Shine a Light on Dark Data with Vertica Flex Tables

Security and Performance advances with Oracle Big Data SQL

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

Microsoft Analytics Platform System (APS)

WHITEPAPER. MemSQL Enterprise Feature List

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE

NoSQL database and its business applications

Some Big Data Challenges

A Make-or-Break Decision: Choosing the Right Serialization Technology for Revenue, Operational, and Brand Success

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

An Overview of Projection, Partitioning and Segmentation of Big Data Using Hp Vertica

Transform your bottom line: 5G Fixed Wireless Access

Chapter 6 VIDEO CASES

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

WITH RELIABLE, AFFORDABLE ENTERPRISE PRI

When Milliseconds Matter. The Definitive Buying Guide to Network Services for Healthcare Organizations

Optimisation drives digital transformation

Service Provider Consulting

by Cisco Intercloud Fabric and the Cisco

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Big Data Analytics. Rasoul Karimi

90 % of WAN decision makers cite their

A Primer on Web Analytics

How to Leverage Containers to Bolster Security and Performance While Moving to Google Cloud

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

SD-WAN Solution How to Make the Best Choice for Your Business

CISC 7610 Lecture 2b The beginnings of NoSQL

Cisco CloudCenter Solution Use Case: Application Migration and Management

Fujitsu World Tour 2018

CenturyLink for Microsoft

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Next-Generation Cloud Platform

The Future of Business Depends on Software Defined Storage (SDS) How SSDs can fit into and accelerate an SDS strategy

THE FUTURE OF BUSINESS DEPENDS ON SOFTWARE DEFINED STORAGE (SDS)

A Guide to Best Practices

Big Data Hadoop Stack

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

Overview of Web Mining Techniques and its Application towards Web

Perfect Balance of Public and Private Cloud

Transformation Through Innovation

The New Open Edge. IOT+Telecom+Cloud+Enterprise

Introduction to Cisco IoT Tools for Developers IoT 101

OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. reserved. Insert Information Protection Policy Classification from Slide 8

RWANDA S APPROACH TOWARDS EMERGING TECHNOLOGIES, opportunities and challenges

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

A Better Approach to Leveraging an OpenStack Private Cloud. David Linthicum

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

An Overview of Smart Sustainable Cities and the Role of Information and Communication Technologies (ICTs)

Distributed computing: index building and use

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018

Transcription:

BIG DATA Architecture Hierarchy of knowledge Data: Element (fact, figure, etc.) which is basic information that can be to be based on decisions, reasoning, research and which is treated by the human or without the help of computers. Information: Information recorded on any medium for the purpose of transmission of knowledge. A data put in a context. Knowledge: An understanding of a specific subject(information), through experience (or education). Typically, knowledge is used in terms of a persons skills or expertise in a given area. Knowledge typically reflects an empirical, rather than intuitive, understanding. Wisdom: Optimum judgment, reflecting a deep understanding of people, things, events or situations. A person who has wisdom can effectively apply perception and knowledge in order to produce desired results. 1

Hierarchy of knowledge From where the data comes? Internet of Things (IoT) Devices are increasingly connected and integrated into our information ecosystem. More and more sensors in the devices to make them more "Intelligent" (eg pipeline, automobile, aircraft, watch, clothing) Increasingly "immersive" client experience Online clothing test with color variation and 360 vision Applications using augmented reality (eg Pokémon-GO) The same experiment can pass through several digital channels Start a transaction on your cell phone and be able to continue with assistance (chat, telephone, etc.) Start a movie on its tablet and continue on its TV. New forms of competition Entry price for certain business areas is changing Payment system (Applepay, Googlepay, Square, etc.) Transport UBER, Accommodation AirBnB 2

Augmented Reality https://www.youtube.com/watch?v=7kdvljd5mue https://www.youtube.com/watch?v=fd0xpktqev8 https://zap.works/business/ http://computer.howstuffworks.com/augmented-reality.htm https://www.ted.com/talks/sebastian_wernicke_how_to_use_data_to_make_ a_hit_tv_show 3

Laws, Compliance, Risk and Security Legislators must also adapt to new market conditions and Innovation Globalization and taxing powers Privacy rules Consumer protection Information Systems Integration Intra-enterprise information systems: ERP (Enterprise Resource Planning) PMP (Project Management Planning) MES (Manufacturing Execution Systems) Inter-company information systems: SCM (Supply Chain Management) CRM (Customer Relationship Management) Information systems supporting strategic management Decision Support Systems and Business Intelligence 4

Data Management Data management is part of our evolution Organization, collection, classification and treatment of data support our evolution and understanding of the world. The census dates from antiquity (China, Egypt, Rome) allowing military recruitment, tax collection, etc. Accounting for crops and other assets. Science development using ratings and measures Today, the companies considered as head of are also at the level of information management. Feedback and feedback from clients has increased importance and even influence the decisions of administration 5

Visualize to Better Understand Making the data observable while considering several facts and dimensions helps in their interpretation. Considering all the new data available it becomes essential to have this complex analytical capacity and to be able to communicate easily results. Figurative map of the successive losses of men of the French army in the campaign of Russia 1812-1813 6

Business Architecture Business architecture is defined as "a blueprint of the enterprise that provides a common understanding of the organization and is used to align strategic objectives and tactical demands." People who develop and maintain business architecture are known as business architects. The Business Architecture should define: 1. The Business Objective - the things that are required. 2. The Technology and the Environment that is to be used. 3. People to be involved with their roles and responsibilities. 4. The process to be included for achieving the objective. 5. Outcome after following the described process. 6. Problems that can be faced Business Architecture 7

Big Data Architecture The data has become the fuel of growth and innovation; Before anything data must first be captured, and then organized and integrated. After this phase successfully implemented, data can be analyzed based on the problem. Finally the manager can take a action based on the outcome of this analysis. To support the functional requirements, it is important to support the required performance. Some analysis will be performed in real time Right amount of redundancy, to protected from unanticipated latency and downtime. Any foray into big data first need to start with analysis of the problem to be solved. Big Data Architecture Some Question to be done How much data will my organization need to manage today and in the future? How often will my organization need to manage data in real time or near real time? How much risk can my organization afford? Is my industry subject to strict security, compliance, and governance requirements? How important is speed to my need to manage data? How certain or precise does the data need to be? 8

Big Data Architecture A big data management architecture must include a variety of services that enable companies to make use of myriad data sources in a fast and effective manner. Big Data Architecture Interfaces and feeds : What makes big data big is the fact that it relies on picking up lots of data from lots of sources. Open application programming interfaces (APIs) will be core to any big data architecture. Interfaces exist at every level and between every layer of the stack. Without integration services, big data can t happen. Redundant physical infrastructure: The physical infrastructure is based on a distributed computing model. This means that data may be physically stored in many different locations and can be linked together through networks, the use of a distributed file system, and various big data analytic tools and applications. Redundancy comes in many forms. If your company has created a private cloud, you will want to have redundancy built within the private environment so that it can scale out to support changing workloads. 9

Big Data Architecture - Nodes Big Data Architecture Security infrastructure: Some data needs to be protected, to meet compliance requirements or to protect privacy. We need to take into account who is allowed to see the data and under what circumstances they are allowed to do so. We will need to be able to verify the identity of users as well as their access privilegies. Operational data sources: We have to incorporate all the data sources that will give you a complete picture of our business and see how the data impacts the way you operate your business. There is new emerging approaches to data management in the big data world, including document, graph, columnar, and geospatial database architectures. Collectively, these are referred to as NoSQL, or not only SQL, databases. It s needed to map the data architectures to the types of transactions. Doing so will help to ensure the right data is available when it s needed. Is also needed data architectures that support complex unstructured content. 10

Big Data Architecture All these operational data sources have several characteristics in common: They represent systems of record that keep track of the critical data required for real-time, day-to-day operation of the business. They are continually updated based on transactions happening within business units and from the web. For these sources to provide an accurate representation of the business, they must blend structured and unstructured data. These systems also must be able to scale to support thousands of users on a consistent basis. These might include transactional e-commerce systems, customer relationship management systems, or call center applications. Big Data Performance The data architecture also needs to perform in concert with the organization s supporting infrastructure. (real time?) Performance might also determine the kind of database you would use. What kind of query? Other important operational database approaches include columnar databases that store information efficiently in columns rather than rows. This approach leads to faster performance because input/output is extremely fast. When geographic data storage is part of the equation, a spatial database is optimized to store and query data based on how objects are related in space 11

Big Data Performance Organizing data services and tools: Not all the data that organizations use is operational. A growing amount of data comes from a variety of sources that aren t quite as organized or straightforward, including data that comes from machines or sensors, and massive public and private data sources. With the evolution of computing technology, it is now possible to manage immense volumes of data that previously could have only been handled by supercomputers at great expense. Prices of systems have dropped, and as a result, new techniques for distributed computing are mainstream. The real breakthrough in big data happened as companies like Yahoo!, Google, and Facebook came to the realization that they needed help in monetizing the massive amounts of data their offerings were creating. Big Data Performance Organizing data services and tools: Not all the data that organizations use is operational. A growing amount of data comes from a variety of sources that aren t quite as organized or straightforward, including data that comes from machines or sensors, and massive public and private data sources. With the evolution of computing technology, it is now possible to manage immense volumes of data that previously could have only been handled by supercomputers at great expense. Prices of systems have dropped, and as a result, new techniques for distributed computing are mainstream. The real breakthrough in big data happened as companies like Yahoo!, Google, and Facebook came to the realization that they needed help in monetizing the massive amounts of data their offerings were creating. These emerging companies needed to find new technologies that would allow them to store, access, and analyze huge amounts of data in near real time so that they could monetize the benefits of owning this much data about participants in their networks. Their resulting solutions are transforming thedata management market. 12

Big Data - MapReduce MapReduce was designed by Google as a way of efficiently executing a set of functions against a large amount of data in batch mode. The map component distributes the programming problem or tasks across a large number of systems and handles the placement of the tasks in a way that balances the load and manages recovery from failures. After the distributed computation is completed, another function called reduce aggregates all the elements back together to provide a result. An example of MapReduce usage would be to determine how many pages of a book are written in each of 50 different languages. https://www.youtube.com/watch?v=43fqzash0cq Big Data Big Table Big Table was developed by Google to be a distributed storage system intended to manage highly scalable structured data. Data is organized into tables with rows and columns. Unlike a traditional relational database model, Big Table is a sparse, distributed, persistent multidimensional sorted map. It is intended to store huge volumes of data across commodity servers. https://www.youtube.com/watch?v=7p8oxgboqog 13

Big Data Hadoop Hadoop is an Apache-managed software framework derived from MapReduce and Big Table. Hadoop allows applications based on MapReduce to run on large clusters of commodity hardware. The project is the foundation for the computing architecture supporting Yahoo! s business. Hadoop is designed to parallelize data processing across computing nodes to speed computations and hide latency. Two major components of Hadoop exist: a massively scalable distributed file system that can support petabytes of data and a massively scalable MapReduce engine that computes results in batch. https://www.youtube.com/watch?v=9s-vsewej1u https://www.youtube.com/watch?v=mff750yvdxm 14