Big Data Integrator Platform Platform Architecture and Features Dr. Hajira Jabeen Technical Team Leader-BDE University of Bonn BDE Presentation, EBDVF, 17
BigDataEurope 2
Making Big Data Accessible 3 How can we make it easy?
Platform Goals 4 Easy to: o o o o Install Develop Deploy Integrate
Platform Architecture 5
Platform Architecture 6
Platform Architecture 7
Platform Architecture 8 Support Layer App Layer Init Daemon Traffic Forecast GUIs Satellite Image Analysis Real-time Stream Monitoring... Platform Layer Spark Flink Monitor Kafka... Semantic Layer Ontario SANSA Semagrow Data Layer Hadoop RDF Store NOSQL Store Elasticsearch Resource Management Layer (Swarm) Hardware Layer Premises Cloud (AWS, GCE, MS Azure, ) Cassandra...
BDE Supported Frameworks 9 Search/indexing Data processing Apache Solr Apache Spark Data acquisition Apache Flink Apache Flume Semantic Components Message passing Strabon Apache Kafka Sextant Data storage GeoTriples Hue Silk Apache Cassandra SEMAGROW ScyllaDB LIMES Apache Hive 4Store Postgis OpenLink Virtuoso
Platform features 10 BDE Development Environment o Stack builder o Workflow builder o Instructions to add custom components Administrative Interface o SwarmUI o Logger Interface UI Integrator o Workflow monitor o Integrated web interface
BDE Integrator UI-WorkFlow 11 Git-clone Stackbuilder Select components => (Push Create-Flow) WorkFlow builder Arrange Components => (Push Monitor) New Stack WorkMonitor Deployment status of Components => (Push OK) SwarmUI BDE Logger See the scaling and scale up/down Navigate the componentui and deploy jobs BDE-IDE Integrator UI
Stack Builder 12
Stack Editor 13 Component Services/dockers
BDE Workflow Builder 14 Component 1 Component 2 Component 3
BDE Workflow Monitor 15 Component 1 Finished Component 2 Finished Component 3 Inprogress
Swarm UI-Pipeline 16 Increase number of instances
Monitor 17
Integrator UI 18 Component 1 Component 2
19 st Demo @ booth-10, 1 Floor
Maintenance and Uptake 20 Open-Source, Community Driven o Commitment from core BDE consortium team o Independent BD components maintenance o Platform Maintenance driven by BDI users Adopters o Feuga, Eurostat, ILVO, I2cat, Vicomtech, IoF,... Follow Up Projects o HOBBIT, Special, BigDataOcean, Qrowd, BETTER,
BDE vs Hadoop distributions 21 Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Flexible Modular Architecture no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux Windows/Mac/Linux Management tool Ambari Cloudera manager MapR Control system - Docker swarm + Custom UI
SANSA Scalable Semantic Analytics Stack 22
SANSA: Vision
SANSA Layers
RDF to Tensors
Machine Learning Layer
Interactive SANSA in Browser
SANSA
Semantic Data Lake 29 Data Lake o Repository of data collected in its original formats o Structured, semi-structured, unstructured o Schema-less Semantic Data Lake o Add a Semantic Layer on top of source datasets The data is semantically lifted using ontologies Provide a uniform view over nonuniform data
Semantic Data Lake 30 Execution Plan Decomposing User Query SPARQL query Metadata property -> data source (type) SQL?item gho:country?country.?item gho:disease?disease.... MongoDB Finding Relevant Data Sources + Queries Translation SELECT country, disease,... FROM Observations SQL SQL Database XPath XML File XML JSON Path MongoDB
31 BDI on Github: https://github.com/big-data-europe Technical Questions platform@big-data-europe.eu Project Website: www.big-data-europe.eu Thank you! jabeen@cs.uni-bonn.de SANSA Stack https://github.com/sansa-stack
Paris, EBDVF 22nd November 2017
The mobility use case in Thessaloniki Multisource datasets (speed, traffic flow, travel time) are being used in Thessaloniki for the provision of traffic status short-term prediction based on mobility/traffic patterns recognition. Integration of machine learning techniques using the travel times, traffic counts and speeds as well as the correlations of traffic speed, to train an appropriate Neural Network Model for efficient and robust traffic speed prediction.
The datasets Floating Car Data o o o 500 2.500 speed measurements per minute Location, speed, orientation, status Hundreds of Gb (historical dataset)
Mobility services in Thessaloniki
Mobility services in Thessaloniki
Mobility services in Thessaloniki
Mobility services in Thessaloniki
Mobility services in Thessaloniki TrafficThess (http://www.trafficthess.imet.gr) o Visual representation of the current as well as past speeds in Thessaloniki, Greece o Email notifications Historical raw data export per link in open format o TrafficPaths (http://www.trafficpaths.imet.gr) o Descriptive information of the current travel times wherever available (Thessaloniki, Patra, Irakleon, Serres, Kavala) o Mobile friendly web page TrafficThess Reports (http://www.trafficthessreports.imet.gr) o Visual and descriptive representation of the current traffic conditions (speeds & travel times) on the main roads of Thessaloniki, Greece o Highly customizable email notifications
Mobility services in Thessaloniki TrafficThess (http://www.trafficthess.imet.gr) Reliable traffic conditions monitoring on a 24/7/365 basis Traffic conditions in the city of Thessaloniki, Greece during snowfall on 10 & 11 Jan 2017: https://youtu.be/2z12tukuwam (credits to anmpout for helping out with the video) Keep calm! It s just another congestion on the ring road
Mobility services in Thessaloniki
Mobility services in Thessaloniki TrafficPaths (http://www.trafficpaths.imet.gr) Calculation of travel times on a 24/7/365 basis
Mobility services in Thessaloniki TrafficThess Reports (http://www.trafficthessreports.imet.gr) A personalized single point of access
Mobility services in Thessaloniki Datatank (Back office + restapis) CKAN (front end) http://opendata.imet.gr/data
DR. JOSEP MARIA SALANOVA GRAU JOSE@CERTH.GR +30 2310 498 433 Paris, EBDVF 22nd November 2017