Big Data Integrator Platform Platform Architecture and Features Dr. Hajira Jabeen Technical Team Leader-BDE University of Bonn

Similar documents
Collaboration Opportunities in Big Data Platform, Data Spaces & Semantic Interoperability Sören Auer,

Modern Data Warehouse The New Approach to Azure BI

BigDataEurope Mobility Use Case in Thessaloniki

Hadoop. Introduction / Overview

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Configuring and Deploying Hadoop Cluster Deployment Templates

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Oracle Big Data Fundamentals Ed 2

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Stages of Data Processing

BIG DATA COURSE CONTENT

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Big Data Architect.

Microsoft Perform Data Engineering on Microsoft Azure HDInsight.

Overview of Data Services and Streaming Data Solution with Azure

Hortonworks and The Internet of Things

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

microsoft

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

DATA SCIENCE USING SPARK: AN INTRODUCTION

iway Big Data Integrator New Features Bulletin and Release Notes

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Talend Big Data Sandbox. Big Data Insights Cookbook

Oracle Big Data Fundamentals Ed 1

Big Data with Hadoop Ecosystem

HDInsight > Hadoop. October 12, 2017

MapR Enterprise Hadoop

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Alexander Klein. #SQLSatDenmark. ETL meets Azure

Innovatus Technologies

Big Data Analytics using Apache Hadoop and Spark with Scala

Exam Questions

Flash Storage Complementing a Data Lake for Real-Time Insight

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Hadoop, Yarn and Beyond

DBpedia Data Processing and Integration Tasks in UnifiedViews

New Features and Enhancements in Big Data Management 10.2

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Big Data Applications with Spring XD

Data Governance for the Connected Enterprise

Big Data Hadoop Stack

Microsoft Big Data and Hadoop

Hadoop Development Introduction

ADABAS & NATURAL 2050+

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Azure DevOps. Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region

iway iway Big Data Integrator New Features Bulletin and Release Notes Version DN

Container 2.0. Container: check! But what about persistent data, big data or fast data?!

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Hadoop An Overview. - Socrates CCDH

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Security and Performance advances with Oracle Big Data SQL

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Advanced Continuous Delivery Strategies for Containerized Applications Using DC/OS

Deploying Applications on DC/OS

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

Certified Big Data and Hadoop Course Curriculum

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

STATE OF MODERN APPLICATIONS IN THE CLOUD

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Verteego VDS Documentation

Data Science and Open Source Software. Iraklis Varlamis Assistant Professor Harokopio University of Athens

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS

Modernizing Business Intelligence and Analytics

Talend Big Data Sandbox. Big Data Insights Cookbook

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

arxiv: v1 [cs.dc] 20 Aug 2015

Informatica Enterprise Information Catalog

Virtuoso Infotech Pvt. Ltd.

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc.

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Oracle Application Container Cloud

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

A CD Framework For Data Pipelines. Yaniv

Integrate MATLAB Analytics into Enterprise Applications

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Using DC/OS for Continuous Delivery

Oracle GoldenGate for Big Data

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

WHITEPAPER. MemSQL Enterprise Feature List

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

Agenda. Spark Platform Spark Core Spark Extensions Using Apache Spark

Spatial Analytics Built for Big Data Platforms

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

Data in the Cloud and Analytics in the Lake

Transcription:

Big Data Integrator Platform Platform Architecture and Features Dr. Hajira Jabeen Technical Team Leader-BDE University of Bonn BDE Presentation, EBDVF, 17

BigDataEurope 2

Making Big Data Accessible 3 How can we make it easy?

Platform Goals 4 Easy to: o o o o Install Develop Deploy Integrate

Platform Architecture 5

Platform Architecture 6

Platform Architecture 7

Platform Architecture 8 Support Layer App Layer Init Daemon Traffic Forecast GUIs Satellite Image Analysis Real-time Stream Monitoring... Platform Layer Spark Flink Monitor Kafka... Semantic Layer Ontario SANSA Semagrow Data Layer Hadoop RDF Store NOSQL Store Elasticsearch Resource Management Layer (Swarm) Hardware Layer Premises Cloud (AWS, GCE, MS Azure, ) Cassandra...

BDE Supported Frameworks 9 Search/indexing Data processing Apache Solr Apache Spark Data acquisition Apache Flink Apache Flume Semantic Components Message passing Strabon Apache Kafka Sextant Data storage GeoTriples Hue Silk Apache Cassandra SEMAGROW ScyllaDB LIMES Apache Hive 4Store Postgis OpenLink Virtuoso

Platform features 10 BDE Development Environment o Stack builder o Workflow builder o Instructions to add custom components Administrative Interface o SwarmUI o Logger Interface UI Integrator o Workflow monitor o Integrated web interface

BDE Integrator UI-WorkFlow 11 Git-clone Stackbuilder Select components => (Push Create-Flow) WorkFlow builder Arrange Components => (Push Monitor) New Stack WorkMonitor Deployment status of Components => (Push OK) SwarmUI BDE Logger See the scaling and scale up/down Navigate the componentui and deploy jobs BDE-IDE Integrator UI

Stack Builder 12

Stack Editor 13 Component Services/dockers

BDE Workflow Builder 14 Component 1 Component 2 Component 3

BDE Workflow Monitor 15 Component 1 Finished Component 2 Finished Component 3 Inprogress

Swarm UI-Pipeline 16 Increase number of instances

Monitor 17

Integrator UI 18 Component 1 Component 2

19 st Demo @ booth-10, 1 Floor

Maintenance and Uptake 20 Open-Source, Community Driven o Commitment from core BDE consortium team o Independent BD components maintenance o Platform Maintenance driven by BDI users Adopters o Feuga, Eurostat, ILVO, I2cat, Vicomtech, IoF,... Follow Up Projects o HOBBIT, Special, BigDataOcean, Qrowd, BETTER,

BDE vs Hadoop distributions 21 Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Flexible Modular Architecture no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux Windows/Mac/Linux Management tool Ambari Cloudera manager MapR Control system - Docker swarm + Custom UI

SANSA Scalable Semantic Analytics Stack 22

SANSA: Vision

SANSA Layers

RDF to Tensors

Machine Learning Layer

Interactive SANSA in Browser

SANSA

Semantic Data Lake 29 Data Lake o Repository of data collected in its original formats o Structured, semi-structured, unstructured o Schema-less Semantic Data Lake o Add a Semantic Layer on top of source datasets The data is semantically lifted using ontologies Provide a uniform view over nonuniform data

Semantic Data Lake 30 Execution Plan Decomposing User Query SPARQL query Metadata property -> data source (type) SQL?item gho:country?country.?item gho:disease?disease.... MongoDB Finding Relevant Data Sources + Queries Translation SELECT country, disease,... FROM Observations SQL SQL Database XPath XML File XML JSON Path MongoDB

31 BDI on Github: https://github.com/big-data-europe Technical Questions platform@big-data-europe.eu Project Website: www.big-data-europe.eu Thank you! jabeen@cs.uni-bonn.de SANSA Stack https://github.com/sansa-stack

Paris, EBDVF 22nd November 2017

The mobility use case in Thessaloniki Multisource datasets (speed, traffic flow, travel time) are being used in Thessaloniki for the provision of traffic status short-term prediction based on mobility/traffic patterns recognition. Integration of machine learning techniques using the travel times, traffic counts and speeds as well as the correlations of traffic speed, to train an appropriate Neural Network Model for efficient and robust traffic speed prediction.

The datasets Floating Car Data o o o 500 2.500 speed measurements per minute Location, speed, orientation, status Hundreds of Gb (historical dataset)

Mobility services in Thessaloniki

Mobility services in Thessaloniki

Mobility services in Thessaloniki

Mobility services in Thessaloniki

Mobility services in Thessaloniki TrafficThess (http://www.trafficthess.imet.gr) o Visual representation of the current as well as past speeds in Thessaloniki, Greece o Email notifications Historical raw data export per link in open format o TrafficPaths (http://www.trafficpaths.imet.gr) o Descriptive information of the current travel times wherever available (Thessaloniki, Patra, Irakleon, Serres, Kavala) o Mobile friendly web page TrafficThess Reports (http://www.trafficthessreports.imet.gr) o Visual and descriptive representation of the current traffic conditions (speeds & travel times) on the main roads of Thessaloniki, Greece o Highly customizable email notifications

Mobility services in Thessaloniki TrafficThess (http://www.trafficthess.imet.gr) Reliable traffic conditions monitoring on a 24/7/365 basis Traffic conditions in the city of Thessaloniki, Greece during snowfall on 10 & 11 Jan 2017: https://youtu.be/2z12tukuwam (credits to anmpout for helping out with the video) Keep calm! It s just another congestion on the ring road

Mobility services in Thessaloniki

Mobility services in Thessaloniki TrafficPaths (http://www.trafficpaths.imet.gr) Calculation of travel times on a 24/7/365 basis

Mobility services in Thessaloniki TrafficThess Reports (http://www.trafficthessreports.imet.gr) A personalized single point of access

Mobility services in Thessaloniki Datatank (Back office + restapis) CKAN (front end) http://opendata.imet.gr/data

DR. JOSEP MARIA SALANOVA GRAU JOSE@CERTH.GR +30 2310 498 433 Paris, EBDVF 22nd November 2017