ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 9. Prof. James She

Size: px
Start display at page:

Download "ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 9. Prof. James She"

Transcription

1 ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 9 Prof. James She james.she@ust.hk 1

2 Announcements 1. Tutorial tomorrow: quick review (socket programming + VMs), parrallel computing get ready the image links from T5, as well as your result matrix from T6 (the one with size 5k*2K) 2. Final Project Guideline released on the course webiste 3. Lecture arrangement: next lecture: (50% a Guest Speaker, 50% a site visit), wait for /facebook annoucement lecture on Nov. 26: Student Presentation (15mins including Q&A) 2

3 Selected Works from T6 3

4 Selected Works from T6 4

5 Last lecture 5

6 Outcome of this lecture Multimedia Big Data - 2 Multimedia Big Data: Analytics of Connection Discovery 6

7 Multimedia Big Data 7

8 Multimedia Big Data High-dimensional and universal signals + Social signals (by social network analytics) Content signals (by multimedia signal processing) Combined signals (by multimedia big data analytics) Cross-cultural/language information The data is unstructured 8

9 Multimedia Big Data Biggest Big Data Share same challenges in regular Big Data Can process, compute and store analytics as below? Anything missing here? Big Data Multimedia Big Data 9

10 Multimedia Big Data Image Processing Similar colour (RGB, HSV) RGB Sample set 1 Sample Set 2 Similar objects (SIFT) HSV Similar texture (GIST) SIFT And many more GIST 10

11 Multimedia Big Data Audio Processing Similar volume Similar content (natural language processing) Similar pitch (Fourier transform) And many more I like mountains natural language processing Fourier transform male 11

12 Multimedia Big Data Multimedia Data Processing People share text, audio, image and video (image + audio) cross disciplines techniques needed to understanding the data extract information that are not available or obvious before Text: natural language processing cultures, location, etc. Image: image processing techniques colour, texture Audio: signal processing tones, backgrounds Video: any suggestions? 12

13 Multimedia Big Data Analytics involves content signals multimedia signal processing analysis modelling and compute collected data vector forms measurements and visualizations analytics 13

14 In-class Activity 2 (Individual, 1mins, 1-page): Imagine you are sharing these 2 images in a social network, and only provide 3 user tags for each image Picture 1 Picture 2 14

15 How do you tag this image? Tags from previous class Picture Picture black car sportscar BMW Tiger windsor 15

16 An Example Work in Multimedia Big Data Connection Discovery Using Big Data of User-Shared Images in Social Media, IEEE Trans. on Multimedia, Sep by M. Cheung, J. She., Z. Jie. 16

17 B Introduction Applications of SG social graph C 0.6 item/ info. recommendation (e.g., games, product, location, etc.) A? B friendship C recommendation? 0.1 D B 0.9 C A A and more However, SG is only available to giant companies like Facebook/ Google 17

18 Introduction - Motivations Observation: Friends share visually similar images and tagged user A user B user C user D Can we obtain recommend friends from their shared images with tags? 18

19 Introduction - Motivations However, user tags are not always reliable tie calculation user A user C user B user D missing connection wrong connection How can we understand the shared images without using unreliable user tags? Understand images visually! 19

20 What is a feature? Features are distinct points on images Similar objects share similar features users A B C user shared images features 20

21 Proposed Methods: BoFT Extract features, and assign a non-user generated label to each image to represent its features 21

22 Proposed Methods: BoFT Obtain the user profiles (label distribution) that represent the characteristics of users User connections can be obtained from the profiles 22

23 Datasets Two general social networks, Skyrock and 163 Weibo users can share images, videos and more # of user shared images: 360,000+ # of users follower/followee (a) shared content 23 (b )

24 Datasets Measurements on histograms about # of follower/followee, and # of shared image: They proved that the users are good representation skyrock Weibo

25 Data measurements User profile of user i: L i = (l i,1, l i,k l i,k ), l i,k is the number of occurrence of label k for user i Similarity calculation: S i,j = S L i, L j = L i L j L i L j Recall what you have learned for friendship similarity? Share image of related pairs (friends, C=1) are more similar 25

26 Data measurements The histogram of S i,j (number of users given a S i,j ): skyrock 163 weibo skyrock 163 weibo 26

27 A set diagram of data measured Most pairs have low S i,j, while a few have high S i,j 27

28 Measurements Based on the histograms of S i,j, we can estimate P(S i,j C = 1): probability density function of S i,j for a related pair P S i,j : probability density function of S i,j for all pairs They can be estimated by: b n Si,j P b S i,j a = a ds, P(b S (b a)n i,j a C = 1)= a p b n Si,j,C=1 ds (p) (b a)n C=1 where a = BS i,j B, b = BS i,j B with B=10 What distribution are they? => f S i,j = γe λs i,j 28

29 Data measurements skyrock 163 weibo skyrock 163 weibo 29

30 Recalled: observations Most pairs have low S i,j, while a few have high S i,j Related pairs have higher S i,j than unrelated at high S i,j Can we make use of this observation for recommendation? 30

31 Can we utilize user shared images for follower/followee recommendations 31

32 Problem Formulation Given S i,j, how like users i and j are related? Mathematically: P(C = 1 S i,j ) Using Bayes theorem, it becomes: P(C = 1 S i,j )= P(S i,j C=1)P C=1 P S i,j where: P(S i,j C = 1): PDF of S i,j for a related pair P S i,j : PDF of S i,j for all pairs P C = 1 : Prob. of a related pair, or the network density 32

33 Problem Formulation How can we calculate P(S i,j C = 1) and P S i,j? Recall: P S i,j = γ a e λ as i,j, and P S i,j C = 1 = γ f e λ fs i,j 33

34 Problem Formulation How can we calculate P C = 1? P C = 1 is the network density P C = 1 = N (p) C=1 (p), where N (p) N C=1 is the number of related pairs N (p) is the number of all pairs N (p) can be calculated as N (p) = N(N 1) 2 N is the number of users in the system 34

35 Problem Formulation Combing all the equations, we have: P C = 1 S i,j = γ fe λ fs i,j (p) N C=1 γ a e λ as i,j N (p) = γ f γ a (p) 2N C=1 N(N 1) e (λ a λ f )S i,j In another form: P C = 1 S i,j = γ t e λ ts i,j where: γ t = γ f γ a 2N C=1 (p) N u (N u 1), λ t = λ a λ f 35

36 Problem Formulation γ t = γ f γ a (p) 2N C=1 N u (N u 1) not affect the trend of P C = 1 S i,j is always a positive number which does By measurement, λ t = 8.09 and λ t = 2.36 on Skyrock and 163 Weibo, respectively. It can be concluded that a higher S i,j always implies a higher chance the user i and j to be a related pair But how this helps follower/followee recommendation? 36

37 Follower/followee recommendation Obtain a list of J users, U i,j, is recommended to user i, given S i,j, that are most likely to be related pairs with i Mathematic formulation: U i,j = arg max P(U i,j S i,1, S U i,n (u)) i,j where U i,j is the set of J users recommended to user i By using an naïve Bayes: U i,j = arg max U i,j J j=1 P C = 1 S i,j 37

38 Follower/followee recommendation U i,j = arg max U i,j J j=1 P C = 1 S i,j As P C = 1 S i,j is a strictly increasing function, we have: U i,j = arg max U i,j J j=1 S i,j which is the set of users with the highest S i,j with i Let s see how good it is compare to 2 approaches: friendship similarity random 38

39 Results 39

40 Conclusion Related pairs likely to share visually similar images (detectable by some multimedia signal processing techniques) Discovered useful formulations of user image sharing that is critical for analytics development Follower/followee recommendation can be based on similarity in images features 40

41 Future works How s about other social networks? are they following the same distribution and trend? What s about other image processing techniques? How to make a scalable big data system (as well as storage)? To be discussed in the next topic 41

42 An Example Work in Multimedia Big Data System A Cloud-assisted Framework for Bag-of-Features Tagging in Social Networks, IEEE 4th Symposium on Network Cloud Computing and Applications (NCCA), Jun by Z. Jie, M. Cheung and J. She 42

43 Recalled Proposed Methods: BoFT Extract features, and assign a non-user generated label to each image to represent its features which part is the most computationally intensives? 43

44 Proposed Methods: BoFT k-means: computationally expensive! Can we use multiple machines (e.g., VMs on Amazon) to speed up? 44

45 Cloud-Assisted Framework for BoFT Using MapReduce to handle the k-means Map: classification step; data parallel over data point Reduce: recomputed means; data parallel over centers 45

46 Experimental Results Speedup θ: execution time on stand-alone machine(ts) divided by the one on cloud (Tc). θ=ts/tc if θ=1: no difference using cloud or stand-alone machine. if θ>1: cloud is faster if θ<1: stand-alone machine is faster 3 dimensions of the study: no. of VMs k (no. of clusters in k-means) no. of images involved 46

47 Experimental Results Dataset: Skyrock (same as the previous paper) User can share images and other types of content follower/followee shared content 47

48 Experimental Results # of images has to be large to make cloud faster It is common for a social network to have >120k images 48

49 Experimental Results # of VMs has to be large to make cloud faster 49

50 Experimental Results Cloud is better when k (# of unique label) is larger 50

51 Experimental Results Scaleup defines the ability of an m-times larger system to perform on m-times larger datasets Tm: the execution time for m VMs to perform on m- times larger datasets. Tm # of images # of VMs 1 10k k k k k k 6 51

52 Experimental Results Scaleup drops slowly with m, which means that more data and VMs only reduce the efficiency a little A good system should have a Tm closes to 1, means that increase the data and VMs will not increase much overhead (e.g., communication among VMs) 52

53 Conclusion A cloud-assisted framework improves the efficiency of BoFT for computing the analytics However, cloud-assisted frameworks only helps with large # of images, VMs. The system design and resources optimization of the cloud platform is driven by the mechanism of the multimedia big data analytics. 53

54 -End of Lecture 9 Questions / Comments? 54

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4 Prof. James She james.she@ust.hk 1 Selected Works of Activity 4 2 Selected Works of Activity 4 3 Last lecture 4 Mid-term

More information

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Tutorial 1 [Visualization and Data Analytic] Prof. James She james.she@ust.hk 1 Outcomes of this tutorial 1. Basic Task: Visualization

More information

REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK

REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK 1 Dr.R.Kousalya, 2 T.Sindhupriya 1 Research Supervisor, Professor & Head, Department of Computer Applications, Dr.N.G.P Arts and Science College, Coimbatore

More information

An Analytic System for User Gender Identification through User Shared Images

An Analytic System for User Gender Identification through User Shared Images 0 An Analytic System for User Gender Identification through User Shared Images Ming Cheung, HKUST-NIE Social Media Lab James She, HKUST-NIE Social Media Lab Many social media applications, such as recommendation,

More information

Expectation Maximization: Inferring model parameters and class labels

Expectation Maximization: Inferring model parameters and class labels Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/27/2017 Jumble of unlabeled images HISTOGRAM blue

More information

Epilog: Further Topics

Epilog: Further Topics Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Epilog: Further Topics Lecture: Prof. Dr. Thomas

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,

More information

Department of Computer Science San Marcos, TX Report Number TXSTATE-CS-TR Clustering in the Cloud. Xuan Wang

Department of Computer Science San Marcos, TX Report Number TXSTATE-CS-TR Clustering in the Cloud. Xuan Wang Department of Computer Science San Marcos, TX 78666 Report Number TXSTATE-CS-TR-2010-24 Clustering in the Cloud Xuan Wang 2010-05-05 !"#$%&'()*+()+%,&+!"-#. + /+!"#$%&'()*+0"*-'(%,1$+0.23%(-)+%-+42.--3+52367&.#8&+9'21&:-';

More information

Semi-supervised Learning

Semi-supervised Learning Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 25: Parallel Databases CSE 344 - Winter 2013 1 Announcements Webquiz due tonight last WQ! J HW7 due on Wednesday HW8 will be posted soon Will take more hours

More information

Machine Learning Practice and Theory

Machine Learning Practice and Theory Machine Learning Practice and Theory Day 9 - Feature Extraction Govind Gopakumar IIT Kanpur 1 Prelude 2 Announcements Programming Tutorial on Ensemble methods, PCA up Lecture slides for usage of Neural

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

GraphCEP Real-Time Data Analytics Using Parallel Complex Event and Graph Processing

GraphCEP Real-Time Data Analytics Using Parallel Complex Event and Graph Processing Institute of Parallel and Distributed Systems () Universitätsstraße 38 D-70569 Stuttgart GraphCEP Real-Time Data Analytics Using Parallel Complex Event and Graph Processing Ruben Mayer, Christian Mayer,

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Scalable Tools - Part I Introduction to Scalable Tools

Scalable Tools - Part I Introduction to Scalable Tools Scalable Tools - Part I Introduction to Scalable Tools Adisak Sukul, Ph.D., Lecturer, Department of Computer Science, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/mbds2018/ Scalable Tools session

More information

Automated Tagging for Online Q&A Forums

Automated Tagging for Online Q&A Forums 1 Automated Tagging for Online Q&A Forums Rajat Sharma, Nitin Kalra, Gautam Nagpal University of California, San Diego, La Jolla, CA 92093, USA {ras043, nikalra, gnagpal}@ucsd.edu Abstract Hashtags created

More information

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014 CS15-319 / 15-619 Cloud Computing Recitation 3 September 9 th & 11 th, 2014 Overview Last Week s Reflection --Project 1.1, Quiz 1, Unit 1 This Week s Schedule --Unit2 (module 3 & 4), Project 1.2 Questions

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Texture. Texture is a description of the spatial arrangement of color or intensities in an image or a selected region of an image.

Texture. Texture is a description of the spatial arrangement of color or intensities in an image or a selected region of an image. Texture Texture is a description of the spatial arrangement of color or intensities in an image or a selected region of an image. Structural approach: a set of texels in some regular or repeated pattern

More information

TA Section: Problem Set 4

TA Section: Problem Set 4 TA Section: Problem Set 4 Outline Discriminative vs. Generative Classifiers Image representation and recognition models Bag of Words Model Part-based Model Constellation Model Pictorial Structures Model

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Information Visualisation

Information Visualisation Information Visualisation Computer Animation and Visualisation Lecture 18 Taku Komura tkomura@ed.ac.uk Institute for Perception, Action & Behaviour School of Informatics 1 Overview Information Visualisation

More information

A Systems View of Large- Scale 3D Reconstruction

A Systems View of Large- Scale 3D Reconstruction Lecture 23: A Systems View of Large- Scale 3D Reconstruction Visual Computing Systems Goals and motivation Construct a detailed 3D model of the world from unstructured photographs (e.g., Flickr, Facebook)

More information

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang Department of Computer Science, University of Houston, USA Abstract. We study the serial and parallel

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,

More information

An Efficient Methodology for Image Rich Information Retrieval

An Efficient Methodology for Image Rich Information Retrieval An Efficient Methodology for Image Rich Information Retrieval 56 Ashwini Jaid, 2 Komal Savant, 3 Sonali Varma, 4 Pushpa Jat, 5 Prof. Sushama Shinde,2,3,4 Computer Department, Siddhant College of Engineering,

More information

Intensification Of Dark Mode Images Using FFT And Bilog Transformation

Intensification Of Dark Mode Images Using FFT And Bilog Transformation Intensification Of Dark Mode Images Using FFT And Bilog Transformation Yeleshetty Dhruthi 1, Shilpa A 2, Sherine Mary R 3 Final year Students 1, 2, Assistant Professor 3 Department of CSE, Dhanalakshmi

More information

Intensity Transformations and Spatial Filtering

Intensity Transformations and Spatial Filtering 77 Chapter 3 Intensity Transformations and Spatial Filtering Spatial domain refers to the image plane itself, and image processing methods in this category are based on direct manipulation of pixels in

More information

PRIVACY PRESERVING CONTENT BASED SEARCH OVER OUTSOURCED IMAGE DATA

PRIVACY PRESERVING CONTENT BASED SEARCH OVER OUTSOURCED IMAGE DATA PRIVACY PRESERVING CONTENT BASED SEARCH OVER OUTSOURCED IMAGE DATA Supriya Pentewad 1, Siddhivinayak Kulkarni 2 1 Department of Computer Engineering. MIT College of Engineering, Pune, India 2 Department

More information

BIG DATA SCIENTIST Certification. Big Data Scientist

BIG DATA SCIENTIST Certification. Big Data Scientist BIG DATA SCIENTIST Certification Big Data Scientist Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big Data. To obtain a certification,

More information

Algorithms of Scientific Computing

Algorithms of Scientific Computing Algorithms of Scientific Computing Overview and General Remarks Michael Bader Technical University of Munich Summer 2017 Classification of the Lecture Who is Who? Students of Informatics: Informatics Bachelor

More information

1 More configuration model

1 More configuration model 1 More configuration model In the last lecture, we explored the definition of the configuration model, a simple method for drawing networks from the ensemble, and derived some of its mathematical properties.

More information

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,

More information

Sublinear Models for Streaming and/or Distributed Data

Sublinear Models for Streaming and/or Distributed Data Sublinear Models for Streaming and/or Distributed Data Qin Zhang Guest lecture in B649 Feb. 3, 2015 1-1 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index

More information

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters. Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Segmentation Computer Vision Spring 2018, Lecture 27

Segmentation Computer Vision Spring 2018, Lecture 27 Segmentation http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 218, Lecture 27 Course announcements Homework 7 is due on Sunday 6 th. - Any questions about homework 7? - How many of you have

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

Introduction to Database Systems CSE 414

Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 24: Parallel Databases CSE 414 - Spring 2015 1 Announcements HW7 due Wednesday night, 11 pm Quiz 7 due next Friday(!), 11 pm HW8 will be posted middle of

More information

Identifying Layout Classes for Mathematical Symbols Using Layout Context

Identifying Layout Classes for Mathematical Symbols Using Layout Context Rochester Institute of Technology RIT Scholar Works Articles 2009 Identifying Layout Classes for Mathematical Symbols Using Layout Context Ling Ouyang Rochester Institute of Technology Richard Zanibbi

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN: Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining

ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, 2014 ISSN 2278 5485 EISSN 2278 5477 discovery Science Comparative Study of Classification Algorithms Using Data Mining Akhila

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Announcements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing

Announcements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing Announcements Database Systems CSE 414 HW4 is due tomorrow 11pm Lectures 18: Parallel Databases (Ch. 20.1) 1 2 Why compute in parallel? Multi-cores: Most processors have multiple cores This trend will

More information

CS6670: Computer Vision

CS6670: Computer Vision CS6670: Computer Vision Noah Snavely Lecture 16: Bag-of-words models Object Bag of words Announcements Project 3: Eigenfaces due Wednesday, November 11 at 11:59pm solo project Final project presentations:

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz Nov 10, 2016 Class Announcements n Database Assignment 2 posted n Due 11/22 The Database Approach to Data Management The Final Database Design

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

INTRODUCTION TO IMAGE PROCESSING (COMPUTER VISION)

INTRODUCTION TO IMAGE PROCESSING (COMPUTER VISION) INTRODUCTION TO IMAGE PROCESSING (COMPUTER VISION) Revision: 1.4, dated: November 10, 2005 Tomáš Svoboda Czech Technical University, Faculty of Electrical Engineering Center for Machine Perception, Prague,

More information

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016 Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the

More information

Final Exam Study Guide

Final Exam Study Guide Final Exam Study Guide Exam Window: 28th April, 12:00am EST to 30th April, 11:59pm EST Description As indicated in class the goal of the exam is to encourage you to review the material from the course.

More information

Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA

Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA Zhu Li Dept of CSEE,

More information

Convex and Distributed Optimization. Thomas Ropars

Convex and Distributed Optimization. Thomas Ropars >>> Presentation of this master2 course Convex and Distributed Optimization Franck Iutzeler Jérôme Malick Thomas Ropars Dmitry Grishchenko from LJK, the applied maths and computer science laboratory and

More information

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA Journal of Computer Science, 9 (5): 534-542, 2013 ISSN 1549-3636 2013 doi:10.3844/jcssp.2013.534.542 Published Online 9 (5) 2013 (http://www.thescipub.com/jcs.toc) MATRIX BASED INDEXING TECHNIQUE FOR VIDEO

More information

Spatial biosurveillance

Spatial biosurveillance Spatial biosurveillance Authors of Slides Andrew Moore Carnegie Mellon awm@cs.cmu.edu Daniel Neill Carnegie Mellon d.neill@cs.cmu.edu Slides and Software and Papers at: http://www.autonlab.org awm@cs.cmu.edu

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Executive Summary. The Nokia AirFrame data center solution

Executive Summary. The Nokia AirFrame data center solution Executive Summary The Nokia AirFrame data center solution Centralized and distributed capabilities for the telco cloud More data has been created in the past two years alone than in the entire history

More information

1 (eagle_eye) and Naeem Latif

1 (eagle_eye) and Naeem Latif 1 CS614 today quiz solved by my campus group these are just for idea if any wrong than we don t responsible for it Question # 1 of 10 ( Start time: 07:08:29 PM ) Total Marks: 1 As opposed to the outcome

More information

Fast Fuzzy Clustering of Infrared Images. 2. brfcm

Fast Fuzzy Clustering of Infrared Images. 2. brfcm Fast Fuzzy Clustering of Infrared Images Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E.

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

Programming assignment 3 Mean-shift

Programming assignment 3 Mean-shift Programming assignment 3 Mean-shift 1 Basic Implementation The Mean Shift algorithm clusters a d-dimensional data set by associating each point to a peak of the data set s probability density function.

More information

Engineering Data Intensive Scalable Systems

Engineering Data Intensive Scalable Systems Engineering Data Intensive Scalable Systems Introduction Internet services companies such as Google, Yahoo!, Amazon, and Facebook, have pioneered systems that have achieved unprecedented scale while still

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

Ma/CS 6b Class 26: Art Galleries and Politicians

Ma/CS 6b Class 26: Art Galleries and Politicians Ma/CS 6b Class 26: Art Galleries and Politicians By Adam Sheffer The Art Gallery Problem Problem. We wish to place security cameras at a gallery, such that they cover it completely. Every camera can cover

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Graph drawing in spectral layout

Graph drawing in spectral layout Graph drawing in spectral layout Maureen Gallagher Colleen Tygh John Urschel Ludmil Zikatanov Beginning: July 8, 203; Today is: October 2, 203 Introduction Our research focuses on the use of spectral graph

More information

The Future of High Performance Computing

The Future of High Performance Computing The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer

More information

Edge Histogram Descriptor, Geometric Moment and Sobel Edge Detector Combined Features Based Object Recognition and Retrieval System

Edge Histogram Descriptor, Geometric Moment and Sobel Edge Detector Combined Features Based Object Recognition and Retrieval System Edge Histogram Descriptor, Geometric Moment and Sobel Edge Detector Combined Features Based Object Recognition and Retrieval System Neetesh Prajapati M. Tech Scholar VNS college,bhopal Amit Kumar Nandanwar

More information

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018 CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this

More information

2. CONNECTIVITY Connectivity

2. CONNECTIVITY Connectivity 2. CONNECTIVITY 70 2. Connectivity 2.1. Connectivity. Definition 2.1.1. (1) A path in a graph G = (V, E) is a sequence of vertices v 0, v 1, v 2,..., v n such that {v i 1, v i } is an edge of G for i =

More information

Content Based Image Retrieval

Content Based Image Retrieval Content Based Image Retrieval R. Venkatesh Babu Outline What is CBIR Approaches Features for content based image retrieval Global Local Hybrid Similarity measure Trtaditional Image Retrieval Traditional

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Distributed Machine Learning Week #9

Machine Learning for Large-Scale Data Analysis and Decision Making A. Distributed Machine Learning Week #9 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Distributed Machine Learning Week #9 Today Distributed computing for machine learning Background MapReduce/Hadoop & Spark Theory

More information

Classification and Detection in Images. D.A. Forsyth

Classification and Detection in Images. D.A. Forsyth Classification and Detection in Images D.A. Forsyth Classifying Images Motivating problems detecting explicit images classifying materials classifying scenes Strategy build appropriate image features train

More information

Social, Information, and Routing Networks: Models, Algorithms, and Strategic Behavior

Social, Information, and Routing Networks: Models, Algorithms, and Strategic Behavior Social, Information, and Routing Networks: Models, Algorithms, and Strategic Behavior Who? Prof. Aris Anagnostopoulos Prof. Luciana S. Buriol Prof. Guido Schäfer What will We Cover? Topics: Network properties

More information

Multimedia Information Systems

Multimedia Information Systems Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 6: Text Information Retrieval 1 Digital Video Library Meta-Data Meta-Data Similarity Similarity Search Search Analog Video Archive

More information

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Creating a Recommender System. An Elasticsearch & Apache Spark approach Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused

More information

What We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server

What We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server What We Have Already Learned CSE 444: Database Internals Lectures 19-20 Parallel DBMSs Overall architecture of a DBMS Internals of query execution: Data storage and indexing Buffer management Query evaluation

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

/ Cloud Computing. Recitation 3 Sep 13 & 15, 2016

/ Cloud Computing. Recitation 3 Sep 13 & 15, 2016 15-319 / 15-619 Cloud Computing Recitation 3 Sep 13 & 15, 2016 1 Overview Administrative Issues Last Week s Reflection Project 1.1, OLI Unit 1, Quiz 1 This Week s Schedule Project1.2, OLI Unit 2, Module

More information

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge Exploiting Internal and External Semantics for the Using World Knowledge, 1,2 Nan Sun, 1 Chao Zhang, 1 Tat-Seng Chua 1 1 School of Computing National University of Singapore 2 School of Computer Science

More information

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia

More information

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8 Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

Thanks to Chris Bregler. COS 429: Computer Vision

Thanks to Chris Bregler. COS 429: Computer Vision Thanks to Chris Bregler COS 429: Computer Vision COS 429: Computer Vision Instructor: Thomas Funkhouser funk@cs.princeton.edu Preceptors: Ohad Fried, Xinyi Fan {ohad,xinyi}@cs.princeton.edu Web page: http://www.cs.princeton.edu/courses/archive/fall13/cos429/

More information

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES 1 RIMA TRI WAHYUNINGRUM, 2 INDAH AGUSTIEN SIRADJUDDIN 1, 2 Department of Informatics Engineering, University of Trunojoyo Madura,

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

An Improved Parallel Scalable K-means++ Massive Data Clustering Algorithm Based on Cloud Computing

An Improved Parallel Scalable K-means++ Massive Data Clustering Algorithm Based on Cloud Computing An Improved Parallel Scalable K-means++ Massive Data Clustering Algorithm Based on Cloud Computing Shuzhi Nie Abstract Clustering is one of the most effective algorithms in data analysis and management.

More information