BIG Data How to handle it. Mark Holton, College of Engineering, Swansea University,

Similar documents
Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016

Goals: Course Unit: Describing Moving Objects Different Ways of Representing Functions Vector-valued Functions, or Parametric Curves

Force Modeling, Quaternion PID, and Optimization

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS

Movit System G1 WIRELESS MOTION DEVICE SYSTEM

Artificial Neuron Modelling Based on Wave Shape

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering

Kinematics of Machines Prof. A. K. Mallik Department of Mechanical Engineering Indian Institute of Technology, Kanpur. Module 10 Lecture 1

For Monday. Read chapter 18, sections Homework:

Improvement of optic flow estimation from an event-based sensor with recurrently connected processing layers

Data Should Not be a Four Letter Word Microsoft Excel QUICK TOUR

Parallel Coordinate Plots

What is the best way to implement my algorithm in Simulink?

PRODUCT DATA. Test for I-deas Core Software BZ-6000

navigation Isaac Skog

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

Rapid Natural Scene Text Segmentation

DS-IMU NEXT GENERATION OF NAVIGATION INSTRUMENTS

Performance Evaluation. Recommended reading: Heidelberg and Lavenberg Computer Performance Evaluation IEEETC, C33, 12, Dec. 1984, p.

Decimals should be spoken digit by digit eg 0.34 is Zero (or nought) point three four (NOT thirty four).

AX22 Performance Computer

Big Data Analytics CSCI 4030

Designing and Implementing a Dynamic Camera System

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Question: What are the origins of the forces of magnetism (how are they produced/ generated)?

Visual Analytics. Visualizing multivariate data:

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Robotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.

Data Mining on Agriculture Data using Neural Networks

Surfaces and Partial Derivatives

Understanding Clustering Supervising the unsupervised

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

In this assignment, we investigated the use of neural networks for supervised classification

Choosing the right graph in Excel

Getting Started. What is SAS/SPECTRAVIEW Software? CHAPTER 1

Practice Exam Sample Solutions

BOSS. Quick Start Guide For research use only. Blackrock Microsystems, LLC. Blackrock Offline Spike Sorter. User s Manual. 630 Komas Drive Suite 200

Introduction to Trajectory Clustering. By YONGLI ZHANG

Evaluating the Performance of a Vehicle Pose Measurement System

Lecture #11: The Perceptron

3 Graphical Displays of Data

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

Automotive Testing: Optical 3D Metrology Improves Safety and Comfort

FLUENT Secondary flow in a teacup Author: John M. Cimbala, Penn State University Latest revision: 26 January 2016

Lecture Notes (Reflection & Mirrors)

3 Graphical Displays of Data

Machine Learning (CSMML16) (Autumn term, ) Xia Hong

Content-based Image Retrieval (CBIR)

- 1 - Class Intervals

Tobii Pro Lab Release Notes

Product information. Hi-Tech Electronics Pte Ltd

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Programming assignment 3 Mean-shift

Neural Nets. CSCI 5582, Fall 2007

Visualisation of uncertainty. Kai-Mikael Jää-Aro

Lab Practical - Limit Equilibrium Analysis of Engineered Slopes

Stage 6 Checklists Have you reached this Standard?

Advanced Programming Features

A Vision System for Automatic State Determination of Grid Based Board Games

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

4. TROUBLESHOOTING PREVIOUS VERSIONS RUN LOLITRACK ALWAYS AS ADMIN WIBU SOFTWARE PROTECTION... 30

Automatic Partiicle Tracking Software USE ER MANUAL Update: May 2015

LASER s Level 2 Maths Course - Summary

Lesson 5: Surface Check Tools

Time-Domain EM at GDD Instrumentation. By Circé Malo Lalande, Eng., MASc. Geophysicist & GM

Implementing Machine Learning in Earthquake Engineering

The question FLOW-3D and IOSO NM

Kuske Martyna, Rubio, Rubio Rafael, Nicolas Jacques, Marco Santiago, Romain Anne-Claude

Introduction to Minitab 1

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

Use case: mapping sparse spatial data with TOPCAT

Vehicle s Kinematics Measurement with IMU

Satellite Attitude Determination

Data anagement -meteorological Data -Hy rological Data - n ironmental ata

An Introduction to PDF Estimation and Clustering

Logger Pro Resource Sheet

FleetLocate v2.7 User Guide

MITOCW watch?v=w_-sx4vr53m

11/8/ th IEEE Requirements Engineering Conference 27-Sep to 1-Oct, 2010

5 Learning hypothesis classes (16 points)

Name: Tutor s

Application Note AN10 RTLS TDOA Platform Components Comparison

INSTITUTE OF AERONAUTICAL ENGINEERING

1 Introduction to Using Excel Spreadsheets

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Chapter 2: The Normal Distribution

UNIVERSITY OF OSLO. Faculty of Mathematics and Natural Sciences

Research and Literature Review on Developing Motion Capture System for Analyzing Athletes Action

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Section 33: Advanced Charts

Surfaces and Partial Derivatives

Multiframe September 2010 Release Note

Studuino Block Programming Environment Guide

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

ACTIVITY TWO CONSTANT VELOCITY IN TWO DIRECTIONS

CHAPTER 3 WAVEFRONT RECONSTRUCTION METHODS. 3.1 Spatial Correlation Method

Geometric Entities for Pilot3D. Copyright 2001 by New Wave Systems, Inc. All Rights Reserved

Image Processing, Analysis and Machine Vision

Transcription:

BIG Data How to handle it Mark Holton, College of Engineering, Swansea University, m.d.holton@swansea.ac.uk

The usual What I m going to talk about The source of the data some tag (data loggers) history Selection criteria Some numbers Hardware / storage Preparation of the data Visualisations Where BIG data processing is heading

Selection criteria to get what we need / want Weight Durability

Choosing the right tag Tags vary greatly in their abilities Range of logging frequencies 1 Hz 800 Hz depending on need Different on-board sensors subsampled relative to the others Sensors from different manufacturers have different accuracies / sensitivities Potentially 10 or more channels of data Accelerometer X Y Z Magnetometer X Y Z Temperature Pressure Light level Battery Speed Humidity GPS / DGPS Differential pressure pitot probe Feeding - inter-mandibular angle sensor (IMASEN)

Tag development

Some numbers For 10 channels, at 40 Hz sampling 400 pieces of data per second 24,000 pieces of data per minute 1,440,000 pieces of data per hour 34,560,000 pieces of data per day Per month 1,036,800,000 This is just the RAW data Smoothed channels Various metrics Marker / sync channels (GPS etc.)

Hardware what is needed, actually really needed i5-i7 Intel processor Lots of memory (16-32 GB) Lots of local hard-drive space Networked storage

Pre-preparing the data Raw data can be difficult to understand. Some data preparation can help a lot with interpretation of results Simple preparation: Channel smoothing Median filters Threshold filters Histograms Time filters i.e. sectioning by the hour, day etc. Slightly harder preparation: Synchronisation of data to GPS data 3D normalisation Draw scaling and colour scaling Hard (multiple channel, large processing) preparation: Complex 3D plots of multi-channel data (the user has the option to plot any combinations of available channels)

Visualisations

Fourier Transforms for frequency analysis

Fourier Transforms for frequency analysis

Interpretation of acceleration traces as body pitch and sway Pitch Pitch

Look more at the bulk data Acceleration XYZ Acceleration XYZ Coloured by Pressure Acceleration XYZ Coloured by Pressure, the radius modulated by pressure

Spherical histograms, modulated by another channel Acceleration XYZ Coloured by Pressure Acceleration XYZ Coloured by Pressure spherical histogram, each block is the sum of one particular attribute of all points that fall within that area Acceleration XYZ Coloured by Pressure, the spherical histogram showing distribution of a different attribute Logarithmic spherical histogram

Acceleration magnitude separated by magnitude into layers

Determining mean angular separation of clusters of data

2 dimensional histograms to clustering or frequent behaviour across multiple axes 2D histograms of various channel data to aid in looking for behaviours due to unusual/unique clusters

Incorporating other systems data with sensor data; GPS GPS trace with associated behaviour/environmentally coloured traces GPS track coloured by one of many attributes including pre-calculated behaviour values GPS track coloured by one of many attributes with additional duplicate environmental and behavioural tracks stacked Different view of a section of GPS data

Searching data using thresholds Surfacing Picking out interesting data using a basic thresholding search algorithm

Everyone wants something different One large difficulty with developing a system to look at this type of data is that EVERYONE wants something different Correlation to compare channels against each other Cross correlation to compare waveform shapes to a database to determine best fit Various tests to determine similarity, convergence etc. For now make the raw and processed data as accessible to the user to carry out post processing in existing statistical/graphing packages such as Origin, MatLab etc. Advanced expression builder for search and tagging of data sections

The future of BIG data processing It s easy for the human eye to see patterns in large, often complex, data sets Data sets containing multiple variables can very quickly become a task beyond standard mathematics and will often require something new Analysis of data using neural networks has been around for decades Create a series of Input neurons or an Input layer Create a Hidden layer to accept outputs from the Input layer Or not Create an Output layer that effectively summarises the results from the Hidden layer The network is then Trained with data sets, and the Weighting that links the Input layer to the Hidden layer and the Hidden layer to the Output layer are mathematically adjusted to achieve a known result based on the known input data. There are many different algorithms around today, with varying degrees of strengths and weaknesses depending on the data set type. The oldest and most typical is the Feed Forward Neural Network that relies on the back-propagation of error. Such a network, if trained correctly for certain types of data sets, will be able to identify to a degree of probability a previously unseen data set. More complex networks include delays and loop-backs (recurrent) to earlier layers to allow sort of memory of previous input sets

Thank you