Machine Learning & Science Data Processing
|
|
- Homer Henry
- 6 years ago
- Views:
Transcription
1 Machine Learning & Science Data Processing Rob Lyon SKA Group University of Manchester
2 Machine Learning (1) Collective term for branch of A.I. Uses statistical tools to make decisions over data intelligently. Appearance of intelligence is an illusion backed up by functions. So how does it work?
3 Machine Learning (1) Collective term for branch of A.I. Uses statistical tools to make decisions over data intelligently. Appearance of intelligence is an illusion backed up by functions. So how does it work? f(x) f(x) 0 f(x) 00 Observe Collect Data Build Model
4 Machine Learning (2) Redesign Intensity Phase Training Validation MAP Design Evaluate Build
5 Machine Learning & SDP
6 Machine Learning & SDP SDP converts / filters CSP data in to products useful for science.
7 Machine Learning & SDP SDP converts / filters CSP data in to products useful for science.
8 Machine Learning & SDP SDP converts / filters CSP data in to products useful for science.
9 Machine Learning & SDP SDP converts / filters CSP data in to products useful for science. Includes pulsar timing, single pulse search (transients signals, FRBs) and periodicity search (pulsars).
10 Machine Learning & SDP SDP converts / filters CSP data in to products useful for science. Includes pulsar timing, single pulse search (transients signals, FRBs) and periodicity search (pulsars). For single pulse and periodicity search, CSP data products describe potential observations of astrophysical phenomena - new discoveries?
11 Existing Approaches Applied to candidate selection for single pulse and periodicity search. Supervised machine learning algorithms. Learn from fixed-size training sets of examples. Variety of algorithms used, with varying computational requirements Sub-Band Number Sub-Int Number 4.0e-032.0e-03 0 Amplitude Sub-Bands Bin Number Sub-Integrations Bin Number Profile A B C Phase DM SNR Period-DM Plane -4.0e e e e e+10 Topo Period Offset from ms DM Curve DM Bary Period Bary Freq Topo Period Topo Freq Acc Jerk DM width SNR Initial N/A D E Optimised ms Hz ms Hz m/s/s m/s/s/s cm/pc periods periods
12 Existing Approaches Applied to candidate selection for single pulse and periodicity search. Supervised machine learning algorithms. Learn from fixed-size training sets of examples. Variety of algorithms used, with varying computational requirements.
13 Which method?
14 Issues With ML at Scale
15 Issues with ML at SKA Scale ML typically very accurate if training data is good. Problems: 1. Not optimised to minimise resource use. 2. Non-adaptive, and retraining with more examples can be expensive (depending on the algorithm). Other issues: training data hard to obtain, classifier decisions often hard to audit.
16 Practical Issues with ML at SKA Scale (1) Adapting to distributional change advantageous. Rapidly adapting to new training examples important for discovery. 1 Period SNR Mean value (normalised) Feature 1 Feature 2 Feature 12 Feature 13 Feature 14 0 DM Sample number
17 Structural Issues with ML at SKA Scale (2) Performance issues 3000 Small sample problem Small disjuncts due to imbalance. How to acquire training examples? How to incorporate expert feedback? How to audit Chi squared value from fitting sine-squared curve to pulse profile Class overlap classifications? Pulsar Non-pulsar Chi squared value from fitting sine curve to pulse profile. 1500
18 Exploring solutions
19 Possible SDP Approach Data stream learning methods. Very low resource requirements. Able to adapt to concept drift. Able to learn from new training examples observed over time. Classifier Predict Store a) Incremental Model Train Test Report Performance b) Batch Model Report Performance
20 Incremental Stream Prototype Process OCLDs Pre-process data ML prediction Source matching Archive OCLD Spouts Key n 0 n 1 2 n 2 2 n 3 2 n 4 2 n 5 2 n 6 2 n 7 n 8 Telescope Manager / Science Teams Bolts Sift per beam Feature extraction Second sift Alerts Spouts Auditing data Primary data Auditing nodes (deactivated during normal execution)
21 Stream Classifier: GH-VFDT
22 Algorithm Performance See Fifty Years of Pulsar Candidate Selection: From simple filters to a new principled real-time classification approach, Lyon et al, accepted for publication in MNRAS, Other results in: Hellinger Distance Trees for Imbalanced Streams, Lyon et al., ICPR, 2014.
23 Prototype Performance Local tests on a single machine (Quad Core i7) 1,000 candidates per second Approx. 2 seconds for a candidate to move through the system. Relatively easy to configure & program. Possible problems: you may want to send more than a tuple. you may want data to go backwards.
24 Open Questions How to acquire unlimited supply of accurately labelled data? To what extent does pulsar data drift? Will a multi-class approach improve classification performance? What will the final compute environment look like? How do we keep track of training data and validate our approaches? Are our features good enough?
25 Summary Structural issues with ML at scale Practical issues with ML at scale Success depends on understanding the data Prototype pipelines under development Thanks for listening!
26 Candidate Numbers
27 Tree Classification Example x 1 x 2 x 3 x 4 x 5 x n h(cm) w(kg) l(cm) h > 50? Predictions x 1 x 2 x No Yes x 1 x 2 x 3 A B A No w > 5? l > 49? Yes No Yes A B A B
28 Apache Storm Cluster Master node Nimbus Zookeeper Zookeeper cluster Zookeeper Zookeeper JVM JVM Worker Process Worker Process Executor Executor Spout Spout Bolt Bolt JVM Worker Process Executor Spout Bolt Executor Executor Executor Bolt Bolt Bolt Worker node Worker node Worker node Supervisor Supervisor Supervisor JVM JVM Worker Process Worker Process Executor Executor Spout Spout Bolt Bolt JVM Worker Process Executor Spout Bolt Executor Executor Executor Bolt Bolt Bolt 1 n
Streaming & Apache Storm
Streaming & Apache Storm Recommended Text: Storm Applied Sean T. Allen, Matthew Jankowski, Peter Pathirana Manning 2010 VMware Inc. All rights reserved Big Data! Volume! Velocity Data flowing into the
More informationFROM LEGACY, TO BATCH, TO NEAR REAL-TIME. Marc Sturlese, Dani Solà
FROM LEGACY, TO BATCH, TO NEAR REAL-TIME Marc Sturlese, Dani Solà WHO ARE WE? Marc Sturlese - @sturlese Backend engineer, focused on R&D Interests: search, scalability Dani Solà - @dani_sola Backend engineer
More informationApache Storm. Hortonworks Inc Page 1
Apache Storm Page 1 What is Storm? Real time stream processing framework Scalable Up to 1 million tuples per second per node Fault Tolerant Tasks reassigned on failure Guaranteed Processing At least once
More informationScalable Streaming Analytics
Scalable Streaming Analytics KARTHIK RAMASAMY @karthikz TALK OUTLINE BEGIN I! II ( III b Overview Storm Overview Storm Internals IV Z V K Heron Operational Experiences END WHAT IS ANALYTICS? according
More informationPaper Presented by Harsha Yeddanapudy
Storm@Twitter Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal,
More informationBefore proceeding with this tutorial, you must have a good understanding of Core Java and any of the Linux flavors.
About the Tutorial Storm was originally created by Nathan Marz and team at BackType. BackType is a social analytics company. Later, Storm was acquired and open-sourced by Twitter. In a short time, Apache
More informationTutorial: Apache Storm
Indian Institute of Science Bangalore, India भ रत य वज ञ न स स थ न ब गल र, भ रत Department of Computational and Data Sciences DS256:Jan17 (3:1) Tutorial: Apache Storm Anshu Shukla 16 Feb, 2017 Yogesh Simmhan
More informationSDP Memo 40: PSRFITS Overview for NIP
SDP Memo 40: PSRFITS Overview for NIP Document Number.......................................................... SDP Memo 40 Document Type.....................................................................
More informationApache Storm: Hands-on Session A.A. 2016/17
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Apache Storm: Hands-on Session A.A. 2016/17 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica
More informationREAL-TIME ANALYTICS WITH APACHE STORM
REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student IN TODAY S TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components with an existing applications 3- Proposed Framework 4-
More informationPractical Machine Learning Agenda
Practical Machine Learning Agenda Starting From Log Management Moving To Machine Learning PunchPlatform team Thales Challenges Thanks 1 Starting From Log Management 2 Starting From Log Management Data
More informationTwitter Heron: Stream Processing at Scale
Twitter Heron: Stream Processing at Scale Saiyam Kohli December 8th, 2016 CIS 611 Research Paper Presentation -Sun Sunnie Chung TWITTER IS A REAL TIME ABSTRACT We process billions of events on Twitter
More informationReal-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b
4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2015) Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b 1
More informationAdaptive selfcalibration for Allen Telescope Array imaging
Adaptive selfcalibration for Allen Telescope Array imaging Garrett Keating, William C. Barott & Melvyn Wright Radio Astronomy laboratory, University of California, Berkeley, CA, 94720 ABSTRACT Planned
More informationInstalling and Configuring Apache Storm
3 Installing and Configuring Apache Storm Date of Publish: 2018-08-30 http://docs.hortonworks.com Contents Installing Apache Storm... 3...7 Configuring Storm for Supervision...8 Configuring Storm Resource
More informationFlying Faster with Heron
Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron TALK OUTLINE BEGIN I! II ( III b OVERVIEW MOTIVATION HERON IV Z OPERATIONAL EXPERIENCES V K HERON PERFORMANCE END [! OVERVIEW TWITTER IS
More informationData Processing for the Square Kilometre Array Telescope
Data Processing for the Square Kilometre Array Telescope Streaming Workshop Indianapolis th October Bojan Nikolic Astrophysics Group, Cavendish Lab University of Cambridge Email: b.nikolic@mrao.cam.ac.uk
More information10/26/2017 Sangmi Lee Pallickara Week 10- B. CS535 Big Data Fall 2017 Colorado State University
CS535 Big Data - Fall 2017 Week 10-A-1 CS535 BIG DATA FAQs Term project proposal Feedback for the most of submissions are available PA2 has been posted (11/6) PART 2. SCALABLE FRAMEWORKS FOR REAL-TIME
More informationData Analytics with HPC. Data Streaming
Data Analytics with HPC Data Streaming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationSKA Regional Centre Activities in Australasia
SKA Regional Centre Activities in Australasia Dr Slava Kitaeff CSIRO-ICRAR APSRC Project Engineer ERIDANUS National Project Lead Why SKA Regional Centres? SKA 1 Observatory Compute capacity: 100 Pflops
More informationStrategies for real-time event processing
SAMPLE CHAPTER Strategies for real-time event processing Sean T. Allen Matthew Jankowski Peter Pathirana FOREWORD BY Andrew Montalenti MANNING Storm Applied by Sean T. Allen Matthew Jankowski Peter Pathirana
More informationR-Storm: A Resource-Aware Scheduler for STORM. Mohammad Hosseini Boyang Peng Zhihao Hong Reza Farivar Roy Campbell
R-Storm: A Resource-Aware Scheduler for STORM Mohammad Hosseini Boyang Peng Zhihao Hong Reza Farivar Roy Campbell Introduction STORM is an open source distributed real-time data stream processing system
More informationFACT-Tools. Processing High- Volume Telescope Data. Jens Buß, Christian Bockermann, Kai Brügge, Maximilian Nöthe. for the FACT Collaboration
FACT-Tools Processing High- Volume Telescope Data Jens Buß, Christian Bockermann, Kai Brügge, Maximilian Nöthe for the FACT Collaboration Outline FACT telescope and challenges Analysis requirements Fact-Tools
More informationHardware Acceleration of Pulsar Search on FPGAs using OpenCL
Hardware Acceleration of Pulsar Search on FPGAs using OpenCL Oliver Sinnen Haomiao Wang & Prabu Thiagaraj (Manchester Uni) Parallel and Reconfigurable Computing Department of Electrical and Computer Engineering
More informationTyphoon: An SDN Enhanced Real-Time Big Data Streaming Framework
Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework Junguk Cho, Hyunseok Chang, Sarit Mukherjee, T.V. Lakshman, and Jacobus Van der Merwe 1 Big Data Era Big data analysis is increasingly common
More informationREAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT ABSTRACT Du-Hyun Hwang, Yoon-Ki Kim and Chang-Sung Jeong Department of Electrical Engineering, Korea University, Seoul, Republic
More informationA Study of Load Balancing between Sensors and the Cloud for a Real-Time Video Streaming Analysis Application Framework
A Study of Load Balancing between Sensors and the Cloud for a Real-Time Video Streaming Analysis Application Framework ABSTRACT Yuko Kurosaki Ochanomizu University 2 1 1 Otsuka Tokyo, Japan yuko-k@ogl.is.ocha.ac.jp
More informationStorm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter
Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Storm at Twitter Twitter Web Analytics Before Storm Queues Workers Example (simplified) Example Workers schemify tweets and
More informationTime Domain Alerts from LSST & ZTF
Time Domain Alerts from LSST & ZTF Eric Bellm University of Washington LSST DM Alert Production Science Lead ZTF Project Scientist -> ZTF Survey Scientist for the LSST Data Management Team and the ZTF
More informationSKA SDP : A snapshot of recent technical directions and conclusions
SKA SDP : A snapshot of recent technical directions and conclusions Chris Broekema ASTRON Netherlands Institute for Radio Astronomy Highlights A whirlwind overview of recent prototyping and design work
More informationPARALLEL CLASSIFICATION ALGORITHMS
PARALLEL CLASSIFICATION ALGORITHMS By: Faiz Quraishi Riti Sharma 9 th May, 2013 OVERVIEW Introduction Types of Classification Linear Classification Support Vector Machines Parallel SVM Approach Decision
More informationComputational issues for HI
Computational issues for HI Tim Cornwell, Square Kilometre Array How SKA processes data Science Data Processing system is part of the telescope Only one system per telescope Data flow so large that dedicated
More informationHortonworks Cybersecurity Package
Tuning Guide () docs.hortonworks.com Hortonworks Cybersecurity : Tuning Guide Copyright 2012-2018 Hortonworks, Inc. Some rights reserved. Hortonworks Cybersecurity (HCP) is a modern data application based
More informationData mining and Knowledge Discovery Resources for Astronomy in the Web 2.0 Age
Data mining and Knowledge Discovery Resources for Astronomy in the Web 2.0 Age Stefano Cavuoti Department of Physics University Federico II Napoli INAF Capodimonte Astronomical Observatory Napoli Massimo
More informationA Decision Support System for Automated Customer Assistance in E-Commerce Websites
, June 29 - July 1, 2016, London, U.K. A Decision Support System for Automated Customer Assistance in E-Commerce Websites Miri Weiss Cohen, Yevgeni Kabishcher, and Pavel Krivosheev Abstract In this work,
More informationUMP Alert Engine. Status. Requirements
UMP Alert Engine Status Requirements Goal Terms Proposed Design High Level Diagram Alert Engine Topology Stream Receiver Stream Router Policy Evaluator Alert Publisher Alert Topology Detail Diagram Alert
More informationStorm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter
Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Basic info Open sourced September 19th Implementation is 15,000 lines of code Used by over 25 companies >2700 watchers on Github
More informationSquare Kilometre Array
Square Kilometre Array C4SKA 16 February 2018 David Luchetti Australian SKA Project Director Presentation 1. Video of the Murchison Radioastronomy Observatory 2. Is SKA a Collaborative project? 3. Australia/New
More informationPaaS SAE Top3 SuperAPP
PaaS SAE Top3 SuperAPP PaaS SAE Top3 SuperAPP Pla$orm Services Group Sam Biwing Monika Rambone Skylee Kingho1d AWS S3 CDN ATS 1k 30+ 10+ Go FE Services Panel C++ Go C/C++ ACM FE Pla$orm Services Group
More information10/24/2017 Sangmi Lee Pallickara Week 10- A. CS535 Big Data Fall 2017 Colorado State University
CS535 Big Data - Fall 2017 Week 10-A-1 CS535 BIG DATA FAQs Term project proposal Feedback for the most of submissions are available PA2 has been posted (11/6) PART 2. SCALABLE FRAMEWORKS FOR REAL-TIME
More informationWorking with Storm Topologies
3 Working with Storm Topologies Date of Publish: 2018-08-13 http://docs.hortonworks.com Contents Packaging Storm Topologies... 3 Deploying and Managing Apache Storm Topologies...4 Configuring the Storm
More informationNew Zealand Involvement in Solving the SKA Computing Challenges
New Zealand Involvement in Solving the SKA Computing Challenges D R ANDREW E N S O R D I R ECTO R H P C R ESEARC H L A B O R ATORY/ D I R ECTOR N Z SKA ALLIANCE COMPUTING FO R S K A COLLO Q U I UM 2 0
More informationPriority Based Resource Scheduling Techniques for a Multitenant Stream Processing Platform
Priority Based Resource Scheduling Techniques for a Multitenant Stream Processing Platform By Rudraneel Chakraborty A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment
More informationScale-Out Algorithm For Apache Storm In SaaS Environment
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department
More informationThis Time. fmri Data analysis
This Time Reslice example Spatial Normalization Noise in fmri Methods for estimating and correcting for physiologic noise SPM Example Spatial Normalization: Remind ourselves what a typical functional image
More informationCorrelX: A Cloud-Based VLBI Correlator
CorrelX: A Cloud-Based VLBI Correlator V. Pankratius, A. J. Vazquez, P. Elosegui Massachusetts Institute of Technology Haystack Observatory pankrat@mit.edu, victorpankratius.com 5 th International VLBI
More informationA Systematic Overview of Data Mining Algorithms
A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a
More informationSTORM AND LOW-LATENCY PROCESSING.
STORM AND LOW-LATENCY PROCESSING Low latency processing Similar to data stream processing, but with a twist Data is streaming into the system (from a database, or a netk stream, or an HDFS file, or ) We
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies Data streams and low latency processing DATA STREAM BASICS What is a data stream? Large data volume, likely structured, arriving at a very high rate Potentially
More information5 member consortium o University of Strathclyde o Wideblue o National Nuclear Laboratory o Sellafield Ltd o Inspectahire
3 year, 1.24M Innovate UK funded Collaborative Research and Development Project (Nuclear Call) o Commenced April 2015 o Follows on from a successful 6 month Innovate UK funded feasibility study 2013-2014
More information2/20/2019 Week 5-B Sangmi Lee Pallickara
2/20/2019 - Spring 2019 Week 5-B-1 CS535 BIG DATA FAQs PART A. BIG DATA TECHNOLOGY 4. REAL-TIME STREAMING COMPUTING MODELS: APACHE STORM AND TWITTER HERON Special GTA for PA1 Saptashwa Mitra Saptashwa.Mitra@colostate.edu
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationASKAP Pipeline processing and simulations. Dr Matthew Whiting ASKAP Computing, CSIRO May 5th, 2010
ASKAP Pipeline processing and simulations Dr Matthew Whiting ASKAP Computing, CSIRO May 5th, 2010 ASKAP Computing Team Members Team members Marsfield: Tim Cornwell, Ben Humphreys, Juan Carlos Guzman, Malte
More informationEvaluation of Apache Kafka in Real-Time Big Data Pipeline Architecture
Evaluation of Apache Kafka in Real-Time Big Data Pipeline Architecture Thandar Aung, Hla Yin Min, Aung Htein Maw University of Information Technology Yangon, Myanmar thandaraung@uit.edu.mm, hlayinmin@uit.edu.mm,
More informationApplying Model Predictive Control Architecture for Efficient Autonomous Data Collection and Operations on Planetary Missions
Applying Model Predictive Control Architecture for Efficient Autonomous Data Collection and Operations on Planetary Missions M. Lieber, E. Schindhelm, R. Rohrschneider, C. Weimer, S. Roark Cahill Center
More informationFunctional Comparison and Performance Evaluation. Huafeng Wang Tianlun Zhang Wei Mao 2016/11/14
Functional Comparison and Performance Evaluation Huafeng Wang Tianlun Zhang Wei Mao 2016/11/14 Overview Streaming Core MISC Performance Benchmark Choose your weapon! 2 Continuous Streaming Micro-Batch
More informationNOWADAYS, many modern applications require largescale
Automating Characterization Deployment in Distributed Data Stream Management Systems Chunkai Wang, Xiaofeng Meng, Member, IEEE, Qi Guo, Zujian Weng, and Chen Yang Abstract Distributed data stream management
More informationAn Evaluation of Data Stream Processing Systems for Data Driven Applications
Procedia Computer Science Volume 80, 2016, Pages 439 449 ICCS 2016. The International Conference on Computational Science An Evaluation of Data Stream Processing Systems for Data Driven Applications Jonathan
More informationCommunity edition(open-source) Enterprise edition
Suseela Bhaskaruni Rapid Miner is an environment for machine learning and data mining experiments. Widely used for both research and real-world data mining tasks. Software versions: Community edition(open-source)
More informationAccelerating Convolutional Neural Nets. Yunming Zhang
Accelerating Convolutional Neural Nets Yunming Zhang Focus Convolutional Neural Nets is the state of the art in classifying the images The models take days to train Difficult for the programmers to tune
More informationPARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5)
PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5) Rob van Nieuwpoort Vrije Universiteit Amsterdam & Astron, the Netherlands Institute for Radio Astronomy Why Radio? Credit: NASA/IPAC
More informationThe PRISM infrastructure System Architecture and User Interface
The PRISM infrastructure System Architecture and User Interface Claes Larsson ECMWF UK PRISM Hamburg D&M presentation p.1/20 Architecture goals PRISM architecture to provide an efficient climate modelling
More informationSource, Sink, and Processor Configuration Values
3 Source, Sink, and Processor Configuration Values Date of Publish: 2018-12-18 https://docs.hortonworks.com/ Contents... 3 Source Configuration Values...3 Processor Configuration Values... 5 Sink Configuration
More informationINTERFERENCE. where, m = 0, 1, 2,... (1.2) otherwise, if it is half integral multiple of wavelength, the interference would be destructive.
1.1 INTERFERENCE When two (or more than two) waves of the same frequency travel almost in the same direction and have a phase difference that remains constant with time, the resultant intensity of light
More informationFunctional Comparison and Performance Evaluation 毛玮王华峰张天伦 2016/9/10
Functional Comparison and Performance Evaluation 毛玮王华峰张天伦 2016/9/10 Overview Streaming Core MISC Performance Benchmark Choose your weapon! 2 Continuous Streaming Ack per Record Storm* Twitter Heron* Storage
More informationAccelerating the acceleration search a case study. By Chris Laidler
Accelerating the acceleration search a case study By Chris Laidler Optimization cycle Assess Test Parallelise Optimise Profile Identify the function or functions in which the application is spending most
More informationAdaptive Online Scheduling in Storm
Adaptive Online Scheduling in Storm Leonardo Aniello aniello@dis.uniroma1.it Roberto Baldoni baldoni@dis.uniroma1.it Leonardo Querzoni querzoni@dis.uniroma1.it Research Center on Cyber Intelligence and
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationPARALLEL ID3. Jeremy Dominijanni CSE633, Dr. Russ Miller
PARALLEL ID3 Jeremy Dominijanni CSE633, Dr. Russ Miller 1 ID3 and the Sequential Case 2 ID3 Decision tree classifier Works on k-ary categorical data Goal of ID3 is to maximize information gain at each
More informationMining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records.
DATA STREAMS MINING Mining Data Streams From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records. Hammad Haleem Xavier Plantaz APPLICATIONS Sensors
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete
More informationComputer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1
Pipelining and Vector Processing Parallel Processing: The term parallel processing indicates that the system is able to perform several operations in a single time. Now we will elaborate the scenario,
More informationWP 14 and Timing Sync
WP 14 and Timing Sync Eiscat Technical meeting 20131105 Leif Johansson National Instruments Eiscat Syncronisation Signal vs. Time-Based Synchronization Signal-Based Share Physical Clocks / Triggers Time-Based
More informationDeveloping A Universal Radio Astronomy Backend. Dr. Ewan Barr, MPIfR Backend Development Group
Developing A Universal Radio Astronomy Backend Dr. Ewan Barr, MPIfR Backend Development Group Overview Why is it needed? What should it do? Key concepts and technologies Case studies: MeerKAT FBF and APSUSE
More informationOver the last few years, we have seen a disruption in the data management
JAYANT SHEKHAR AND AMANDEEP KHURANA Jayant is Principal Solutions Architect at Cloudera working with various large and small companies in various Verticals on their big data and data science use cases,
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationME scope Application Note 36 ODS & Mode Shape Animation
Requirements for Animation ME scope Application Note 36 ODS & Mode Shape Animation The following steps are required in order to display shapes in animation on a structure model, 1. Create a Structure Model
More informationAudioSet: Real-world Audio Event Classification
AudioSet: Real-world Audio Event Classification g.co/audioset Rif A. Saurous, Shawn Hershey, Dan Ellis, Aren Jansen and the Google Sound Understanding Team 2017-10-20 Outline The Early Years: Weakly-Supervised
More informationData Quality Monitoring at CMS with Machine Learning
Data Quality Monitoring at CMS with Machine Learning July-August 2016 Author: Aytaj Aghabayli Supervisors: Jean-Roch Vlimant Maurizio Pierini CERN openlab Summer Student Report 2016 Abstract The Data Quality
More informationChapter 5: Stream Processing. Big Data Management and Analytics 193
Chapter 5: Big Data Management and Analytics 193 Today s Lesson Data Streams & Data Stream Management System Data Stream Models Insert-Only Insert-Delete Additive Streaming Methods Sliding Windows & Ageing
More informationCloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018
Cloudline Autonomous Driving Solutions Accelerating insights through a new generation of Data and Analytics October, 2018 HPE big data analytics solutions power the data-driven enterprise Secure, workload-optimized
More informationApache Flink. Alessandro Margara
Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate
More informationOPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS
OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS 1 Why GPUs? A Tale of Numbers 100x Performance Increase Infrastructure Cost Savings Performance 100x gains over traditional
More informationksa 400 Growth Rate Analysis Routines
k-space Associates, Inc., 2182 Bishop Circle East, Dexter, MI 48130 USA ksa 400 Growth Rate Analysis Routines Table of Contents ksa 400 Growth Rate Analysis Routines... 2 1. Introduction... 2 1.1. Scan
More informationSKA Technical developments relevant to the National Facility. Keith Grainge University of Manchester
SKA Technical developments relevant to the National Facility Keith Grainge University of Manchester Talk Overview SKA overview Receptors Data transport and network management Synchronisation and timing
More informationAudio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationStatus hardware, software and what next? Harry Keizer 1, Simon Bijlsma 1, Daniela de Paulis 1,3,Marc Wolf 1,2.
SETI@CAMRAS 2017 Status hardware, software and what next? Harry Keizer 1, Simon Bijlsma 1, Daniela de Paulis 1,3,Marc Wolf 1,2 1 Stichting CAMRAS, 2 University of Central Lancashire, 3 University of Plymouth
More informationApache Flink. Fuchkina Ekaterina with Material from Andreas Kunft -TU Berlin / DIMA; dataartisans slides
Apache Flink Fuchkina Ekaterina with Material from Andreas Kunft -TU Berlin / DIMA; dataartisans slides What is Apache Flink Massive parallel data flow engine with unified batch-and streamprocessing CEP
More informationComputing seismic QC attributes, fold and offset sampling in RadExPro software Rev
Computing seismic QC attributes, fold and offset sampling in RadExPro software Rev. 22.12.2016 In this tutorial, we will show how to compute the seismic QC attributes (amplitude and frequency characteristics,
More informationChallenges in Data Stream Processing
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria
More informationArticulated Pose Estimation with Flexible Mixtures-of-Parts
Articulated Pose Estimation with Flexible Mixtures-of-Parts PRESENTATION: JESSE DAVIS CS 3710 VISUAL RECOGNITION Outline Modeling Special Cases Inferences Learning Experiments Problem and Relevance Problem:
More informationHadoop ecosystem. Nikos Parlavantzas
1 Hadoop ecosystem Nikos Parlavantzas Lecture overview 2 Objective Provide an overview of a selection of technologies in the Hadoop ecosystem Hadoop ecosystem 3 Hadoop ecosystem 4 Outline 5 HBase Hive
More informationDESIGN AND IMPLEMENTATION OF VISUAL FEEDBACK FOR AN ACTIVE TRACKING
DESIGN AND IMPLEMENTATION OF VISUAL FEEDBACK FOR AN ACTIVE TRACKING Tomasz Żabiński, Tomasz Grygiel, Bogdan Kwolek Rzeszów University of Technology, W. Pola 2, 35-959 Rzeszów, Poland tomz, bkwolek@prz-rzeszow.pl
More informationPutting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21
Big Processing -Parallel Computation COS 418: Distributed Systems Lecture 21 Michael Freedman 2 Ex: Word count using partial aggregation Putting it together 1. Compute word counts from individual files
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationHortonworks Cybersecurity Platform
Tuning Guide () docs.hortonworks.com Hortonworks Cybersecurity : Tuning Guide Copyright 2012-2018 Hortonworks, Inc. Some rights reserved. Hortonworks Cybersecurity (HCP) is a modern data application based
More informationIntelligent Services Serving Machine Learning
Intelligent Services Serving Machine Learning Joseph E. Gonzalez jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc. Contemporary Learning Systems Big Data
More informationIPTA student workshop 2017: Day 1, preparing timing data with psrchive
IPTA student workshop 2017: Day 1, preparing timing data with psrchive June 26, 2017 Kuo Liu 1, Stefan Os lowski 2 1 Max-Planck-Institut für Radioastronomie, Auf dem Hügel 69, D-53121 Bonn, Germany 2 Centre
More informationSpark Overview. Professor Sasu Tarkoma.
Spark Overview 2015 Professor Sasu Tarkoma www.cs.helsinki.fi Apache Spark Spark is a general-purpose computing framework for iterative tasks API is provided for Java, Scala and Python The model is based
More informationGeoLas Consulting All rights reserved. LiDAR GeocodeWF Rev. 03/2012 Specifications subject to change without notice.
GeocodeWF is the tool for converting the raw waveform data collected by Riegl LMS-Q560 and LMS-Q680 laserscanner-based lidar systems into geocoded points in a projected coordinate system. GeocodeWF is
More information