Cognitive-based Computation, Semantic Understanding, and Web Wisdom

Similar documents
: A new version of Supercomputing or life after the end of the Moore s Law

Embedded Technosolutions

BIG DATA TESTING: A UNIFIED VIEW

Big Data - Some Words BIG DATA 8/31/2017. Introduction

The Future of High Performance Computing

High Performance and Cloud Computing (HPCC) for Bioinformatics

Introduction to Text Mining. Hongning Wang

Big Data with Hadoop Ecosystem

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Strategic Briefing Paper Big Data

A Review Paper on Big data & Hadoop

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

I am a Data Nerd and so are YOU!

Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis.

Big Data and Object Storage

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove

The Future of High- Performance Computing

Big Data and Cloud Computing

CSE6331: Cloud Computing

Data Movement & Tiering with DMF 7

CS 6240: Parallel Data Processing in MapReduce: Module 1. Mirek Riedewald

Foundations of Data Engineering

CISC 7610 Lecture 2b The beginnings of NoSQL

Microsoft Developer Day

Distributed Systems CS6421

Hierarchy of knowledge BIG DATA 9/7/2017. Architecture

Introduction to Data Mining and Data Analytics

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY

Big data. Professor Dan Ariely, Duke University.

Data Intensive Scalable Computing

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK

745: Advanced Database Systems

CMSC 433 Programming Language Technologies and Paradigms. Spring 2013

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391

Building the Most Efficient Machine Learning System

Big Data Infrastructure at Spotify

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

The Fusion Distributed File System

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Data Intensive Scalable Computing. Thanks to: Randal E. Bryant Carnegie Mellon University

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

Data Intensive Computing SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY PASIG June, 2009

HPC Storage Use Cases & Future Trends

The amount of data increases every day Some numbers ( 2012):

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

2/26/2017. The amount of data increases every day Some numbers ( 2012):

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Commercial Data Intensive Cloud Computing Architecture: A Decision Support Framework

Introduction to Data Science Day 2

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Unit 10 Databases. Computer Concepts Unit Contents. 10 Operational and Analytical Databases. 10 Section A: Database Basics

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

CS 102. Big Data. Spring Big Data Platforms

New research on Key Technologies of unstructured data cloud storage

History of computing. Until future. Computers: Information Technology in Perspective By Long and Long

Challenges and Opportunities with Big Data. By: Rohit Ranjan

The age of Big Data Big Data for Oracle Database Professionals

High Performance Computing on MapReduce Programming Framework

Big Compute, Big Net & Big Data: How to be big

Dealing with Data Especially Big Data

Top 25 Big Data Interview Questions And Answers

Performance-related aspects in the Big Data Astronomy Era: architects in software optimization

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

BIG DATA & HADOOP: A Survey

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

DATA 301 Introduction to Data Analytics Data Representation. Dr. Ramon Lawrence University of British Columbia Okanagan

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

How Insurers are Realising the Promise of Big Data

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

A Survey on Comparative Analysis of Big Data Tools

Acknowledgements. Beyond DBMSs. Presentation Outline

Database Management Systems

TWOO.COM CASE STUDY CUSTOMER SUCCESS STORY

Big Data Analytics. Rasoul Karimi

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Review - Relational Model Concepts

HPC Architectures. Types of resource currently in use

The Future of High- Performance Computing

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

The Mont-Blanc approach towards Exascale

Introduction to the Mathematics of Big Data. Philippe B. Laval

Cluster Computing Architecture. Intel Labs

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

COSC 304 Introduction to Database Systems. NoSQL Databases. Dr. Ramon Lawrence University of British Columbia Okanagan

Modern Database Concepts

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

Massive Online Analysis - Storm,Spark

High Performance and Cloud Computing (HPCC) for Bioinformatics

Stages of Data Processing

Based on Big Data: Hype or Hallelujah? by Elena Baralis

Transcription:

Panel SEMAPRO/ADVCOMP/DATA ANALYTICS, Porto, 01.10.2013 Reutlingen University Cognitive-based Computation, Semantic Understanding, and Web Wisdom Moderation: Fritz Laux, Reutlingen University, Germany Panelists: Felix Heine, University of Applied Sciences & Arts, Hannover, Germany Rudolf Berrendorf, Bonn-Rhein-Sieg University, Germany Alexey Cheptsov, High Performance Computing Center Stuttgart - HLRS, Germany F. Laux Knowledge Stack Reutlingen University Data sequence of symbols from well defined set of symbols information (facts) coded for mechanized processing (DIN) Information data + metadata recognized (relevant) data Knowledge information + thinking process Information linked to the person's background knowledge Wisdom applied knowledge (English, 1999) Understanding in the context of a person s background knowledge and ethics 2 /4 F. Laux data := facts, data in context, info in context, knowledge in context (applied knowledge) (L. P. English, 1999) 1

Reutlingen University To buy or not to buy Problem 1: Semantic Loss Knowledge transfer Background, ethics 1 Background, ethics 2 To buy or not to buy Does the receiver come to the same conclusion? 3 /4 F. Laux Source: Felix Schiele, A Layered Model for Knowledge Transfer, SemaPro 2013 Problem 2: Reliability of Information Reutlingen University Current state of web (Google) query What is the height of Mt. Everest? None answered the question, but the first 4 links including Wikipedia said: 8848 m 4. link had two different heights (in m) 5. answer from Encyclopedia Britannica claims 8850 m Which one is correct? Majority Trusted source 4 /4 F. Laux 2

New Opportunities with Clever Techniques and Big Iron courtesy: LLNL courtesy: IBM

Traditional Way Use a Mobile Phone / Tablet / PC / server / mainframe and run a program / an algorithm to fulfill some task All allgorithms variants in use: deterministic, randomized, Most times, you need to know in advance (when you program) what you want to do (e.g., signal processing) 2

Another Way You have already Giga/Tera/Peta/Exa/Zeta/Yotta bytes of information (in some way structured, t unstructured, ) t Learnfromthis existing databases to answer questions (e.g., Watson) to solve problems to detect problems to make projections into the future Lots of techniques known around this idea for quite some time (AI, Machine Learning, Neural Networks, Data Mining, ) 3

Clever Algorithms Are Sometimes Not Enough As data becomes really large and/or algorithms need to be more clever (need much more time tocompute), t usual mobile phone / tblt/ tablet PC / do not suffice any more Limitations are at any single point in the usual hardware: raw compute power, available memory, I/O bandwidth, network bandwidth, Some people stop here! 4

Here Comes the Sun PC LLNL Sequoia cores <10 1.6 million FP performance < 100 GFlops 20 PetaFlops main memory 4 8 GB 1.6 PetaBytes network bandwidth 1 GigaBits/s 30 PetaBytes/s (internal network) courtesy: LLNL 5

Use It! Use the raw power you need somewhere in the spectrum from smaller up to big, big machines to process / learn from big, big data to find better solutions to answer additional questions that could not be answered before For example: run many, many filters / mining algorithms in parallel and combine intermediate t results for optimization problems, start processing with many different seeds in parallel Start thinking about the opportunities with tomorrow s compute capacities 6

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: Panel SEMAPRO/ADVCOMP/DATA ANALYTICS Dr.-Ing. Alexey Cheptsov SEMAPRO :: 2013 :: 30.09.2013 :: Dr. Alexey Cheptsov

Cognitive-based Computation, Semantic Understanding, and Web Wisdom ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: Large-scale Graph Computing: Some Examples Google KnowledgeGraph 700 million nodes 20 billion facts several terabytes of files Facebook ssocialgraph 60 PB of graph structured data Twitter sinterestgraph NoSQL database solutions Credit: Google, http://www.stateofsearch.com/search-in-the-knowledge-graph-era/ SEMAPRO 2013 :: 30.09.2013 :: Dr. Alexey Cheptsov

Cognitive-based Computation, Semantic Understanding, and Web Wisdom ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: Hardware Infrastruktures e-services Applications Semantic Web Linked Data Data Facts Cloud Internet Data Center HPC SEMAPRO 2013 Intranet :: Information Knowledge Use of Supercomputers Hermit the HLRS mainstream system - Cray XE6 architecture - Performance of 1,2 PetaFLOP (10^15 floating point operations per second ) - 3552 compute nodes - 64GB RAM per node - 2,7 PB disc space 30.09.2013 :: Dr. Alexey Cheptsov

Cognitive-based Computation, Semantic Understanding, and Web Wisdom ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: What are the challenges: Semantic Web View Infrastructure on demand - shared/distributed memory parallel clusters - multicore machines - GPGPU devices -FPGA -alltogether? Programming models to achieve high performance -MapReduce -MPI - New programming languages JUNIPER Java platform for high performance and real-time large scale data management SEMAPRO 2013 :: 30.09.2013 :: Dr. Alexey Cheptsov

Cognitive-based Computation, Semantic Understanding, and Web Wisdom ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: Main Results HPC is going to face new challenges related to data-centric application expansion. Parallel programming models (mainly MapReduceand MPI) are the key enablers of HPC to data-centric applications Reaching near-peak performance is going to be the major challenge Future Work Promote existing technologies, such as MPI, to solving new challenges, such as Big Data. Making existing framework more data-centric. SEMAPRO 2013 :: 30.09.2013 :: Dr. Alexey Cheptsov

Panel: Data Analytics 2013 Data Quality and Big Data Felix Heine

Data driven decisions Algorithm Tested Data Tested? Algorithm Tested Data Tested? Data Quality and Big Data / Felix Heine Seite 2

Data Quality Data gets more and more important. However, Data Quality is underestimated: "My data does not have any errors" "Yes, I know data quality is important, however, I will spend my budget first for new features" Compare with Algorithms: "My program does not have any error"??? "Yes, I know, testing is important, however, I will first develop new features"??? Data Quality and Big Data / Felix Heine

Data Quality and Big Data Big Data: 3 V s Volume: Large amounts of data Velocity: Stream data with very high data rates Variety: Not only relational data: text, binary, XML,... Sometimes also: Veracity Not a property of the data! Collect first, analyse later Monitor your data quality! Data Quality and Big Data / Felix Heine

Quality of Big Data: What can we do? Understand your data! Use data profiling tools Research challenge: more sophisticated profiling Statistics, machine learning, time series,... Good visualization of the results Scalability Keep the knowledge for constant monitoring In which language? Research challenge: What will be the SQL for Big Data? Declarative language for data analytics Data Quality and Big Data / Felix Heine