Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Similar documents
Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Dynamic Routing Between Capsules

Deep Learning. Volker Tresp Summer 2014

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

Intro to Artificial Intelligence

Challenges motivating deep learning. Sargur N. Srihari

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Does the Brain do Inverse Graphics?

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Machine Learning with Python

PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN

Demystifying Machine Learning

Using Machine Learning for Classification of Cancer Cells

CSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

Introduction to Data Mining and Data Analytics

Deep Learning with Tensorflow AlexNet

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

A Deep Learning primer

Machine Learning 13. week

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

Deep Learning for Computer Vision II

Novel Lossy Compression Algorithms with Stacked Autoencoders

Machine Learning : Clustering, Self-Organizing Maps

A Deep Learning Framework for Authorship Classification of Paintings

Applying Supervised Learning

Storyline Reconstruction for Unordered Images

CS 4510/9010 Applied Machine Learning. Deep Learning. Paula Matuszek Fall copyright Paula Matuszek 2016

ECS289: Scalable Machine Learning

Machine Learning. MGS Lecture 3: Deep Learning

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 19: Generative Adversarial Networks

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

Asynchronous Parallel Stochastic Gradient Descent. A Numeric Core for Scalable Distributed Machine Learning Algorithms

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Advanced Introduction to Machine Learning, CMU-10715

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

COMS 4771 Clustering. Nakul Verma

Multimodal Medical Image Retrieval based on Latent Topic Modeling

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Network Traffic Measurements and Analysis

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Harp-DAAL for High Performance Big Data Computing

Unsupervised Learning of Spatiotemporally Coherent Metrics

10 Billion Parameter Neural Networks in your Basement. Adam Coates Stanford University

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

Unsupervised learning in Vision

DEEP LEARNING TO DIVERSIFY BELIEF NETWORKS FOR REMOTE SENSING IMAGE CLASSIFICATION

Does the Brain do Inverse Graphics?

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Word Embeddings in Search Engines, Quality Evaluation. Eneko Pinzolas

ECG782: Multidimensional Digital Signal Processing

MoonRiver: Deep Neural Network in C++

08 An Introduction to Dense Continuous Robotic Mapping

Stacked Denoising Autoencoders for Face Pose Normalization

Compiler Optimizations and Auto-tuning. Amir H. Ashouri Politecnico Di Milano -2014

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Large Scale Distributed Deep Networks

Alternatives to Direct Supervision

Reservoir Computing with Emphasis on Liquid State Machines

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Apparel Classifier and Recommender using Deep Learning

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

Machine Learning in WAN Research

Review on Text Mining

Su et al. Shape Descriptors - III

ROB 537: Learning-Based Control

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

COMPARATIVE DEEP LEARNING FOR CONTENT- BASED MEDICAL IMAGE RETRIEVAL

Exploratory Analysis: Clustering

Keyword Extraction by KNN considering Similarity among Features

Machine Learning in WAN Research

Spatial Localization and Detection. Lecture 8-1

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Automated Website Fingerprinting through Deep Learning

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Study of Residual Networks for Image Recognition

CE4031 and CZ4031 Database System Principles

Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR

An Exploration of Computer Vision Techniques for Bird Species Classification

A Keypoint Descriptor Inspired by Retinal Computation

PARALLEL TRAINING OF NEURAL NETWORKS FOR SPEECH RECOGNITION

CSC321: Neural Networks. Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis. Geoffrey Hinton

Inception Network Overview. David White CS793

Two-step Modified SOM for Parallel Calculation

CPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia

Transcription:

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1

Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing Discriminative tasks Semantic tagging Overview of DL challenges in Big Data Analytics Streaming data High dimensional data Large scale models Domain adaptation 2

Variety Volume Velocity Veracity Data quality and validation Feature engineering http://crossing-technologies.com/big-data-analytics/ 3

Primary focus of DL is to extract meaningful features (high-level representation) in data. Labeled data Unlabeled data For a Big Data application, it s more useful to extract representations and patterns from unsupervised data. Classically, Big data uses shallow training architectures which are able to capture local relationships. However DL algorithms are performing better at extracting non-local and global relationships and pattern in the data. Output of DL algorithms can be used to used to develop more discriminative and complex models on Big Data. 4

1. Relatively simple linear models can work effectively with the knowledge obtained from the more complex and more abstract data representations. 2. Increased automation of data representation extraction from unsupervised data enables its broad application to different data types. Image Text video 3. Relational and semantic knowledge can be obtained at the higher levels of abstraction and representation of the raw data. Makes it more global features 4. DL algorithms and architecture are more suitable for issues related to Volume and Variety of Big Data Analytics. 5

Goal: Using semantics to label data for using in information retrieval and storing. Gained importance since large amount of data (text, images, video, and etc.) are being generated across various domains. Previous solutions are facing two challenges Massive volume of data (volume) Different data representations (variety) https://towardsdatascience.com/deep-learning-4-embedding-layers-f9a02d55ac12 6

Instead of using raw data for data indexing, DL can be used to generate high-level abstract data representation. DL reveals complex association and factors for unsupervised data. High-level abstract data representation should Be meaningful Demonstrate rational and semantic association Doesn t depend on data Data representation plays an important role in the indexing of the data. Let the similar representation to be stored closer to each other in memory. BUT, how to represent it? 7

Goal of document representation is to create a representation that condenses specific and unique aspects of the document, e.g. document topic. Historic models are mostly based on word count and occurrence of words Consider individual words to be dimensions, with different dimensions being independent. As example TF-IDF and BM25 Often observed that the occurrence of words are highly correlated. Using Deep Learning techniques to extract meaningful data representations Leads to the reduction of the dimensions of the document data representations. 8

One option is Vector representation enables faster search and information retrieval. Each point is presented by a vector representation in a vector space. The data instances that have similar vector representations are likely to have similar semantic meaning. Allowing for a vector-based comparison which is more efficient than comparing instances based directly on raw data 9

words that are similar end up clustering nearby each other. https://www.tensorflow.org/tutorials/re presentation/word2vec 10

A example of vector representation. Given a dictionary(corpus) as input with billions of words, generate a vector representation of words where Words with similar semantic meanings are close neighbors to each other Semantic indexing helps predict a target word for a given context Examples: Paris belongs to France and Berlin belongs to Germany. Adam was studying in the library. Word vectors with such semantic relationships could be used to improve many existing Natural Language Processing(NLP) applications. 11

Paper: Hinton G, Salakhutdinov R (2011) Discovering binary codes for documents by learning deep generative models.topics Cogn Sci 3(1):74 91 A DL model where Lowest layer represents the word-count vector of a document the top layer represents a learned binary code for that document. By using short binary codes as addresses, we can perform retrieval on very large document sets in a time that is independent of the size of the document set using only one word of memory to describe each document. Slow training time but fast inference time https://en.wikipedia.org/wiki/hamming_distance 12

Deep Learning algorithms make it possible to learn complex nonlinear representations between word occurrences. With DL, One can leveraging unsupervised data to have access to a much larger amount of input. By using DL, we don t have to worry about generating huge labeled data to capture complex representations. To accelerate training and improve accuracy, one can use mixture of labeled data and unlabeled data as input for deep learning. Remember, Similar to textual data, DL can be used for other kinds of data. 13

Proposed method: 1. Use DL algorithms to extract complicated non-linear features from the raw data 2. Then use simpler linear models to perform discriminative tasks with output of step 1 Advantages: Extracting features with Deep Learning adds non-linearity to the data analysis, associating the discriminative tasks closely to Artificial Intelligence. Applying relatively simple linear analytical models on the extracted features is more computationally efficient, which is important for Big Data Analytics. Massive amount of data develop some non-linear features learnt through DL, for further study data analysts can apply it on simpler linear model for further analysis. 14

Discriminative analysis can be the primary purpose of data analysis or can be used for tagging on the data for the purpose of searching. As example, searching of audio/video files by speech. Google s image search To deal with searching through large scale image data, one approach to consider is to automate the process of tagging images and extracting semantic information from the images. DL helps in discriminative task of semantic tagging for data. http://beyondplm.com/wp-content/uploads/2013/10/tagging-plm-data-300x261.jpg 15

In semantic indexing, the focus is on using the Deep Learning abstract representations directly for data indexing purposes. In semantic tagging, the abstract data representations are considered as features for performing the discriminative task of data tagging This tagging on data can also be used for data indexing as well, but the primary idea here is that Deep Leaning makes it possible to tag massive amounts of data by applying simple linear modeling methods on complicated features that were extracted by Deep Learning algorithms. 16

ImageNet was an example of using DL for object detection in images. (already presented) DL algorithms can be trained without using labelled data. Google and Stanford formulated a very large deep neural network that was able to learn very high-level features, such as face detection or cat detection from scratch (without any priors) by just using unlabeled data Their work was a large scale investigation on the feasibility of building high-level features with Deep Learning using only unlabeled (unsupervised) data, and clearly demonstrated the benefits of using Deep Learning with unsupervised data. based on these features their approach also outperformed the state-of-the-art and recognized 22,000 object categories from the ImageNet dataset. Conclusion: using features extracted from a given dataset to successfully perform a discriminative task on another dataset is possible in DL. 17

Detecting the activity or scene in the given contex. Deep Learning can be used for action scene recognition as well as video data tagging. https://www.researchgate.net/figure/four-action-recognition-cases-considering-object-or-scene-a-joint-objects-and-action_fig1_307515996 18

01 02 03 04 DL has shown remarkable results in extracting useful features (representation) for performing discriminative tasks on image/video data. These features (representations) are useful for data tagging and can be used in search engines. High-level complex data representations are useful for the application of simpler linear models for Big Data analytics. Future work should determine appropriate objectives in learning good representations 19

So far, we focused on emphasizing the applicability and benefits of Deep Learning algorithms for Big Data Analytics. However, there are some challenges posed by characteristics of Big Data to DL learning with streaming data dealing with high-dimensional data scalability of models distributed computing 20

One of the challenges in Big Data is to handling stream data and fast-moving input data. DL needs to handle these challenges as well. Some example solutions use: Incremental feature learning Denoising autoencoders 21

Denoising autoencoders are a variant of autoencoders which extract features from corrupted input, where the extracted features are robust to noisy data and good for classification purposes. In a denoising autoencoder, there is one hidden layer which extracts features Initial Number of nodes = number of extracted features. Incrementally, the samples that don t follow objective function are reused for adding new nodes to the hidden layer based on those samples Incoming new data samples are used to jointly retrain all the features. 22

This approach is useful in applications where the distribution of data changes with respect to time in massive online data streams. Similar features are merged to produce a more compact set of features. Incremental feature learning and extraction can be generalized for other Deep Learning algorithms, such as RBM. It s possible to handle new incoming stream of an online large scale data. It avoids expensive cross-validation analysis in selecting the number of features in large-scale datasets. monotonically adding features can lead to having a lot of redundant features and overfitting of data. 23

Some DL algorithms can become very computationally expensive when dealing with high dimensional data like image. The cause of slow down is the long training time through hierarchy of DL algorithm. High dimensional data will make it more complicated to train DL model In case of large volume of data in Big Data, these DL algorithms can be bottlenecks. An example of existing solution: Convolutional neural networks 24

Used in ImageNet to achieve better results. The neurons in the hidden layers units do not need to be fully connected to previous layer Instead they can be connected to the neurons that are in the same spatial area The resolution of the image data is also reduced when moving toward higher layers in the network. Scales up efficiently on high dimensional data. 25

Main question: how to scale up the achievements to a larger model and larger dataset? Support for model parallelism Within a node (multi-threading) Across machines (MPI) Support for data parallelism Using high speed communication infrastructure Handle communication, synchronization, and details of parallelism within framework. Increases productivity as well as performance! Example of such work DistBelief COTS HPC https://xiandong79.github.io/intro-distributed-deep-learning 26

Determining the optimal number of model parameters in such large-scale models and improving their computational practicality pose challenges in DL for Big Data Analytics. handling massive volumes of data large-scale DL models for Big Data Analytics have to face with other Big Data problems Domain adaptation streaming data Above challenges motivates the need for better large-scale DL algorithms and architecture. 27

Problem: the distribution of the training data (from which the representations are learnt) is different from the distribution of the test data (on which the learnt representations are deployed). A known problem in Big data due to variation of input data types and domains. Volume + Variety What representations should be shared among different domains? Poses a challenge in utilization of the entire Big Data input dataset vs portion of it. What volume of data in enough for training to have ample representations? 28

Future works should focus on addressing one or more of these problems often seen in Big Data, thus contributing to the Deep Learning and Big Data Analytics research corpus We need to adapt DL for challenges associated with Big Data high dimensionality streaming data analysis scalability of Deep Learning models improved formulation of data abstractions distributed computing criteria for extracting good data representations domain adaptation semantic indexing 29

High Performance Computing Big Data Each of these areas generate new challenges and propose solutions for each other. Deep Learning 30