Survey of the Mathematics of Big Data

Similar documents
The Mathematics of Big Data

Rectangular Coordinates in Space

Image Processing. Filtering. Slide 1

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Preparation Meeting. Recent Advances in the Analysis of 3D Shapes. Emanuele Rodolà Matthias Vestner Thomas Windheuser Daniel Cremers

Discrete Wavelets and Image Processing

Compressive Sensing for Multimedia. Communications in Wireless Sensor Networks

Tomographic reconstruction: the challenge of dark information. S. Roux

Research in Computational Differential Geomet

Modern Signal Processing and Sparse Coding

Introduction to the Mathematics of Big Data. Philippe B. Laval

Massive Data Analysis

Compressed Sensing and Applications by using Dictionaries in Image Processing

What is a... Manifold?

Planar Graphs and Surfaces. Graphs 2 1/58

Tracking Computer Vision Spring 2018, Lecture 24

CHAPTER 9 INPAINTING USING SPARSE REPRESENTATION AND INVERSE DCT

Video Compression An Introduction

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

The Normal Curve. June 20, Bryan T. Karazsia, M.A.

Volume 2, Issue 9, September 2014 ISSN

Redundant Data Elimination for Image Compression and Internet Transmission using MATLAB

G009 Scale and Direction-guided Interpolation of Aliased Seismic Data in the Curvelet Domain

Data-Driven Modeling. Scientific Computation J. NATHAN KUTZ OXPORD. Methods for Complex Systems & Big Data

Digital Signal Processing. Soma Biswas

CS 521 Data Mining Techniques Instructor: Abdullah Mueen

Module 9 : Numerical Relaying II : DSP Perspective

Application of Daubechies Wavelets for Image Compression

EE123 Digital Signal Processing

CS 335 Graphics and Multimedia. Image Compression

An introduction to JPEG compression using MATLAB

More Formulas: circles Elementary Education 12

Introduction to Functions of Several Variables

Outcomes List for Math Multivariable Calculus (9 th edition of text) Spring

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

Game Mathematics. (12 Week Lesson Plan)

4. Image Retrieval using Transformed Image Content

DCT image denoising: a simple and effective image denoising algorithm

Review for the Final

Compression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction

Geometry. Course Requirements

Compressive Sensing for High-Dimensional Data

(Refer Slide Time 3:31)

G Practical Magnetic Resonance Imaging II Sackler Institute of Biomedical Sciences New York University School of Medicine. Compressed Sensing

A Brief Look at Optimization

Measurements and Bits: Compressed Sensing meets Information Theory. Dror Baron ECE Department Rice University dsp.rice.edu/cs

diverging. We will be using simplified symbols of ideal lenses:

Computational QC Geometry: A tool for Medical Morphometry, Computer Graphics & Vision

CPSC 535 Assignment 1: Introduction to Matlab and Octave

Engineering Mathematics II Lecture 16 Compression

Fourier Transforms and Signal Analysis

MRT based Fixed Block size Transform Coding

MATH 498 Brief Description of Some Possible Topics for the Senior Assignment Fall 2007

RGB Digital Image Forgery Detection Using Singular Value Decomposition and One Dimensional Cellular Automata

Efficient Imaging Algorithms on Many-Core Platforms

Compressed Point Cloud Visualizations

Final Review. Image Processing CSE 166 Lecture 18

DYADIC WAVELETS AND DCT BASED BLIND COPY-MOVE IMAGE FORGERY DETECTION

ESE 531: Digital Signal Processing

Lossy Compression of Scientific Data with Wavelet Transforms

(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22)

SEG/New Orleans 2006 Annual Meeting

Compressive Sensing Based Image Reconstruction using Wavelet Transform

Scientific Visualization Example exam questions with commented answers

IMAGE COMPRESSION USING FOURIER TRANSFORMS

Image Compression. CS 6640 School of Computing University of Utah

x 2 + 3, r 4(x) = x2 1

An Introduc+on to Mathema+cal Image Processing IAS, Park City Mathema2cs Ins2tute, Utah Undergraduate Summer School 2010

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Math 144 Activity #4 Connecting the unit circle to the graphs of the trig functions

PRIMES Circle: High School Math Enrichment at MIT

Replacement of Missing Data and Outliers Using Wavelet Transform Methods

Isn t it Saturday? IMO Trainning Camp NUK, 2004_06_19 2

P257 Transform-domain Sparsity Regularization in Reconstruction of Channelized Facies

Non-Differentiable Image Manifolds

Transportation Informatics Group, ALPEN-ADRIA University of Klagenfurt. Transportation Informatics Group University of Klagenfurt 12/24/2009 1

MAS.963 Special Topics: Computational Camera and Photography

Illustrated Guide to the. UTeach. Electronic Portfolio

Lecture 12 Level Sets & Parametric Transforms. sec & ch. 11 of Machine Vision by Wesley E. Snyder & Hairong Qi

Error-Correcting Codes

SAMLab Tip Sheet #1 Translating Mathematical Formulas Into Excel s Language

Chapter 7. Conclusions and Future Work

Introduction to Topics in Machine Learning

Block Diagram. Physical World. Image Acquisition. Enhancement and Restoration. Segmentation. Feature Selection/Extraction.

Program Changes Computer Systems Engineering

On the Shape of Data

Image deblurring by multigrid methods. Department of Physics and Mathematics University of Insubria

Program Changes Communications Engineering

VLSI Implementation of Daubechies Wavelet Filter for Image Compression

Optimal transport and redistricting: numerical experiments and a few questions. Nestor Guillen University of Massachusetts

Incoherent noise suppression with curvelet-domain sparsity Vishal Kumar, EOS-UBC and Felix J. Herrmann, EOS-UBC

Limited view X-ray CT for dimensional analysis

One of the big complaints from remote

Integrated Math, Part C Chapter 1 SUPPLEMENTARY AND COMPLIMENTARY ANGLES

MITOCW watch?v=kz7jjltq9r4

Structurally Random Matrices

Name Course Days/Start Time

Wavelet Transform (WT) & JPEG-2000

Image Compression Using Modified Fast Haar Wavelet Transform

Cryptography. What is Cryptography?

Transcription:

Survey of the Mathematics of Big Data Issues with Big Data, Mathematics to the Rescue Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) Math & Big Data Fall 2015 1 / 28

Introduction We survey some mathematical techniques used with Big Data. The goal here is to make you aware of these techniques rather than giving you detail about them. This will be done later on, at least for some of the techniques we survey. Philippe B. Laval (KSU) Math & Big Data Fall 2015 2 / 28

Outline 1 Issues with Big Data 2 How Can Mathematics Help Types of Data Mathematics to the Rescue! 3 Acquisition of Data Information Content Compressed Sensing 4 Data Analysis 5 Conclusion High Dimensional Data Imaging Data Philippe B. Laval (KSU) Math & Big Data Fall 2015 3 / 28

Issues with Big Data We live in a digital world, which generates a lot of data. New technologies produce enormous amount of data. We are gathering more data than ever, even from old technologies. Problem: Acquisition/storage, analysis and transmission of data. Total data generated > total storage. Increase in generation rate >> increase in communication rate. Analysis can be very complex. Problem: Data is noisy, unstructured and dynamic. Philippe B. Laval (KSU) Math & Big Data Fall 2015 4 / 28

How Can Mathematics Help? One answer is given in a video from a previous lecture, remembering that many of the skills mathematicians have are needed in programming. But also, mathematics......allows us to formalize both the data and the problem....provides a big "chest" of tools or methodologies....allows validation of the methodologies (proof of functionality). Philippe B. Laval (KSU) Math & Big Data Fall 2015 5 / 28

How Can Mathematics Help? - Types of Data Signals. Can be represented by a function f : R R Philippe B. Laval (KSU) Math & Big Data Fall 2015 6 / 28

How Can Mathematics Help? - Types of Data Signals. Can be represented by a function f : R R Once the signal is represented by f, we can: Enhance the signal. Philippe B. Laval (KSU) Math & Big Data Fall 2015 7 / 28

How Can Mathematics Help? - Types of Data Signals. Can be represented by a function f : R R Once the signal is represented by f, we can: Enhance the signal. Remove noise from the signal (PDE techniques, Fourier transform, wavelets, statistics) Philippe B. Laval (KSU) Math & Big Data Fall 2015 7 / 28

How Can Mathematics Help? - Types of Data Signals. Can be represented by a function f : R R Once the signal is represented by f, we can: Enhance the signal. Remove noise from the signal (PDE techniques, Fourier transform, wavelets, statistics) Extract features from the signal. Philippe B. Laval (KSU) Math & Big Data Fall 2015 7 / 28

How Can Mathematics Help? - Types of Data Signals. Can be represented by a function f : R R Once the signal is represented by f, we can: Enhance the signal. Remove noise from the signal (PDE techniques, Fourier transform, wavelets, statistics) Extract features from the signal. The next slide shows a clean signal (red), some noise (green) and a noisy signal. Philippe B. Laval (KSU) Math & Big Data Fall 2015 7 / 28

How Can Mathematics Help? - Types of Data Philippe B. Laval (KSU) Math & Big Data Fall 2015 8 / 28

How Can Mathematics Help? - Types of Data Images. Can be represented by a function f : R 2 R Philippe B. Laval (KSU) Math & Big Data Fall 2015 9 / 28

How Can Mathematics Help? - Types of Data Images. Can be represented by a function f : R 2 R Put a grid on the image (the finer the grid, the highest the resolution). Each cell of the grid contains a positive integer indicating a gray level (for gray images). That s the function f. For color images (RGB), each cell is a vector that contains 3 positive integers indicating the amount of Red, Green and Blue. As for signals, once we have f we can enhance the image, remove noise from it, extract features. Philippe B. Laval (KSU) Math & Big Data Fall 2015 10 / 28

Experiment: Clean Image and Image With 20% Noise Animated version Philippe B. Laval (KSU) Math & Big Data Fall 2015 11 / 28

Experiment: Clean Image and Image With 50% Noise Animated version Philippe B. Laval (KSU) Math & Big Data Fall 2015 12 / 28

How Can Mathematics Help? - Types of Data Data on manifolds. Can be represented by a function f : S 2 R Philippe B. Laval (KSU) Math & Big Data Fall 2015 13 / 28

Acquisition of Data As noted above, the main problem is the size of the data. However, this cannot be solved by increasing storage capacity. Possible solutions: Search for better ways to compress. Acquire less data. Philippe B. Laval (KSU) Math & Big Data Fall 2015 14 / 28

Acquisition of Data - Better Ways to Compress Data Compression saves storage but also decreases bandwidth. General ideas behind compression: Take advantage of redundant information. Only keep the most relevant information. Einstein: "Not everything that can be counted counts, and not everything that counts can be counted." Philippe B. Laval (KSU) Math & Big Data Fall 2015 15 / 28

Acquisition of Data - Better Ways to Compress The standard for data compression was JPEG. It is based on the discrete cosine transform (DCT) (see Fourier analysis, Fourier transform, fast Fourier transform (FFT) and the discrete fast Fourier transform (DFFT). It expresses the image in terms of building blocks, the functions cos [ π n ( i + 1 2 ) k ]. Definition The DCT is a linear, invertible function f : R n R n. It transforms the numbers (x 0, x 1,..., x n 1 ) into (X 0, X 1,..., X n 1 ) by the formula n 1 [ ( π X k = x i cos i + 1 ) ] k n 2 i=0 for k = 0, 1,..., n 1 Philippe B. Laval (KSU) Math & Big Data Fall 2015 16 / 28

Acquisition of Data - Better Ways to Compress The latest standard for data compression is JPEG2000. It is similar to JPEG, but it uses wavelets instead of the discrete cosine transform. The branch of mathematics which studies wavelets is called Harmonic Analysis. Like the DCT, wavelets decompose the image in terms of building blocks. But the building blocks are not limited to the cosine function. Both the DCT and wavelets use the divide and conquer method. Decompose an image in terms of its building blocks. Only keep the most relevant building blocks. Philippe B. Laval (KSU) Math & Big Data Fall 2015 17 / 28

Acquisition of Data - Better Ways to Compress Problem: The raw data still has to be stored first Solution: Compressed sensing (Donoho, Candès, Romberg, Tao (2006) Philippe B. Laval (KSU) Math & Big Data Fall 2015 18 / 28

Acquisition of Data - Compressed Sensing Main idea: Goal: Assumption: Only a small percentage of the data to be captured is really relevant (sparse data). Problem: We don t know which. Solution: Use random building blocks. This is pushing the sampling theorem beyond its limits! Goal: To capture a signal with as few points as possible. Thus the raw data will be already compressed. Requires a lot of linear algebra, solving sparse systems. Philippe B. Laval (KSU) Math & Big Data Fall 2015 19 / 28

Applications of Compressed Sensing Business Compression Astronomy Radar Biology Communications Imaging Science Geology Medicine Information theory Remote sensing Optics Philippe B. Laval (KSU) Math & Big Data Fall 2015 20 / 28

Data Analysis There are many techniques, statistical and mathematical which already exist. Problem: The analysis maybe be too complex for current computing power. Data can be noisy, unstructured and dynamic. However, these problems cannot be simply solved by more computing power. Possible solutions include: Graph Theory. TDA (Topological Data Analysis). Philippe B. Laval (KSU) Math & Big Data Fall 2015 21 / 28

Data Analysis - High-Dimensional Data Strategy: Detection of inner structure. Reduction of dimension Examples: A circle can be approximated by lines. A sphere can be approximated by flat surfaces. Philippe B. Laval (KSU) Math & Big Data Fall 2015 22 / 28

Data Analysis - High-Dimensional Data Philippe B. Laval (KSU) Math & Big Data Fall 2015 23 / 28

Data Analysis - High-Dimensional Data Question: How do we identify inner structure? By using topology. The link will open a website. Scroll down to play the video. Philippe B. Laval (KSU) Math & Big Data Fall 2015 24 / 28

Data Analysis - Imaging Data Tasks: Removing Noise. Pattern recognition. Reconstructing missing data.... Tools: Statistics. Fourier and harmonic analysis. Partial differential equations. Compressed sensing. Linear Algebra. Philippe B. Laval (KSU) Math & Big Data Fall 2015 25 / 28

Conclusion Big Data is here to stay. We need to tame it! The techniques presented are fairly diffi cult and require a lot knowledge in various branches of mathematics, statistics and science. Not everybody needs to be a mathematician. But chances are you will work with one. Knowing at least some mathematics will make the communication and the teamwork easier. Questions? Philippe B. Laval (KSU) Math & Big Data Fall 2015 26 / 28

Some Questions 1 Name some new technologies which generate data which were not mentioned during the talk. 2 Give examples of unstructured data not mentioned during the talk. 3 Why is removing noise from data (images as well as other data) important? 4 Give applications of pattern recognition in images. Philippe B. Laval (KSU) Math & Big Data Fall 2015 27 / 28

Online Articles to Read 1 The Mathematical Shape of Things to Come from Quanta Magazine. 2 The Real Secret to Unlocking Big Data? Math. 3 The New Shape of Big Data. 4 New Mathematical Method May Help Tame Big Data from Communications of the ACM. 5 Better Way to Make Sense of Big Data from Science Daily. Philippe B. Laval (KSU) Math & Big Data Fall 2015 28 / 28