ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 9. Prof. James She
|
|
- Beverley Hutchinson
- 6 years ago
- Views:
Transcription
1 ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 9 Prof. James She james.she@ust.hk 1
2 Announcements 1. Tutorial tomorrow: quick review (socket programming + VMs), parrallel computing get ready the image links from T5, as well as your result matrix from T6 (the one with size 5k*2K) 2. Final Project Guideline released on the course webiste 3. Lecture arrangement: next lecture: (50% a Guest Speaker, 50% a site visit), wait for /facebook annoucement lecture on Nov. 26: Student Presentation (15mins including Q&A) 2
3 Selected Works from T6 3
4 Selected Works from T6 4
5 Last lecture 5
6 Outcome of this lecture Multimedia Big Data - 2 Multimedia Big Data: Analytics of Connection Discovery 6
7 Multimedia Big Data 7
8 Multimedia Big Data High-dimensional and universal signals + Social signals (by social network analytics) Content signals (by multimedia signal processing) Combined signals (by multimedia big data analytics) Cross-cultural/language information The data is unstructured 8
9 Multimedia Big Data Biggest Big Data Share same challenges in regular Big Data Can process, compute and store analytics as below? Anything missing here? Big Data Multimedia Big Data 9
10 Multimedia Big Data Image Processing Similar colour (RGB, HSV) RGB Sample set 1 Sample Set 2 Similar objects (SIFT) HSV Similar texture (GIST) SIFT And many more GIST 10
11 Multimedia Big Data Audio Processing Similar volume Similar content (natural language processing) Similar pitch (Fourier transform) And many more I like mountains natural language processing Fourier transform male 11
12 Multimedia Big Data Multimedia Data Processing People share text, audio, image and video (image + audio) cross disciplines techniques needed to understanding the data extract information that are not available or obvious before Text: natural language processing cultures, location, etc. Image: image processing techniques colour, texture Audio: signal processing tones, backgrounds Video: any suggestions? 12
13 Multimedia Big Data Analytics involves content signals multimedia signal processing analysis modelling and compute collected data vector forms measurements and visualizations analytics 13
14 In-class Activity 2 (Individual, 1mins, 1-page): Imagine you are sharing these 2 images in a social network, and only provide 3 user tags for each image Picture 1 Picture 2 14
15 How do you tag this image? Tags from previous class Picture Picture black car sportscar BMW Tiger windsor 15
16 An Example Work in Multimedia Big Data Connection Discovery Using Big Data of User-Shared Images in Social Media, IEEE Trans. on Multimedia, Sep by M. Cheung, J. She., Z. Jie. 16
17 B Introduction Applications of SG social graph C 0.6 item/ info. recommendation (e.g., games, product, location, etc.) A? B friendship C recommendation? 0.1 D B 0.9 C A A and more However, SG is only available to giant companies like Facebook/ Google 17
18 Introduction - Motivations Observation: Friends share visually similar images and tagged user A user B user C user D Can we obtain recommend friends from their shared images with tags? 18
19 Introduction - Motivations However, user tags are not always reliable tie calculation user A user C user B user D missing connection wrong connection How can we understand the shared images without using unreliable user tags? Understand images visually! 19
20 What is a feature? Features are distinct points on images Similar objects share similar features users A B C user shared images features 20
21 Proposed Methods: BoFT Extract features, and assign a non-user generated label to each image to represent its features 21
22 Proposed Methods: BoFT Obtain the user profiles (label distribution) that represent the characteristics of users User connections can be obtained from the profiles 22
23 Datasets Two general social networks, Skyrock and 163 Weibo users can share images, videos and more # of user shared images: 360,000+ # of users follower/followee (a) shared content 23 (b )
24 Datasets Measurements on histograms about # of follower/followee, and # of shared image: They proved that the users are good representation skyrock Weibo
25 Data measurements User profile of user i: L i = (l i,1, l i,k l i,k ), l i,k is the number of occurrence of label k for user i Similarity calculation: S i,j = S L i, L j = L i L j L i L j Recall what you have learned for friendship similarity? Share image of related pairs (friends, C=1) are more similar 25
26 Data measurements The histogram of S i,j (number of users given a S i,j ): skyrock 163 weibo skyrock 163 weibo 26
27 A set diagram of data measured Most pairs have low S i,j, while a few have high S i,j 27
28 Measurements Based on the histograms of S i,j, we can estimate P(S i,j C = 1): probability density function of S i,j for a related pair P S i,j : probability density function of S i,j for all pairs They can be estimated by: b n Si,j P b S i,j a = a ds, P(b S (b a)n i,j a C = 1)= a p b n Si,j,C=1 ds (p) (b a)n C=1 where a = BS i,j B, b = BS i,j B with B=10 What distribution are they? => f S i,j = γe λs i,j 28
29 Data measurements skyrock 163 weibo skyrock 163 weibo 29
30 Recalled: observations Most pairs have low S i,j, while a few have high S i,j Related pairs have higher S i,j than unrelated at high S i,j Can we make use of this observation for recommendation? 30
31 Can we utilize user shared images for follower/followee recommendations 31
32 Problem Formulation Given S i,j, how like users i and j are related? Mathematically: P(C = 1 S i,j ) Using Bayes theorem, it becomes: P(C = 1 S i,j )= P(S i,j C=1)P C=1 P S i,j where: P(S i,j C = 1): PDF of S i,j for a related pair P S i,j : PDF of S i,j for all pairs P C = 1 : Prob. of a related pair, or the network density 32
33 Problem Formulation How can we calculate P(S i,j C = 1) and P S i,j? Recall: P S i,j = γ a e λ as i,j, and P S i,j C = 1 = γ f e λ fs i,j 33
34 Problem Formulation How can we calculate P C = 1? P C = 1 is the network density P C = 1 = N (p) C=1 (p), where N (p) N C=1 is the number of related pairs N (p) is the number of all pairs N (p) can be calculated as N (p) = N(N 1) 2 N is the number of users in the system 34
35 Problem Formulation Combing all the equations, we have: P C = 1 S i,j = γ fe λ fs i,j (p) N C=1 γ a e λ as i,j N (p) = γ f γ a (p) 2N C=1 N(N 1) e (λ a λ f )S i,j In another form: P C = 1 S i,j = γ t e λ ts i,j where: γ t = γ f γ a 2N C=1 (p) N u (N u 1), λ t = λ a λ f 35
36 Problem Formulation γ t = γ f γ a (p) 2N C=1 N u (N u 1) not affect the trend of P C = 1 S i,j is always a positive number which does By measurement, λ t = 8.09 and λ t = 2.36 on Skyrock and 163 Weibo, respectively. It can be concluded that a higher S i,j always implies a higher chance the user i and j to be a related pair But how this helps follower/followee recommendation? 36
37 Follower/followee recommendation Obtain a list of J users, U i,j, is recommended to user i, given S i,j, that are most likely to be related pairs with i Mathematic formulation: U i,j = arg max P(U i,j S i,1, S U i,n (u)) i,j where U i,j is the set of J users recommended to user i By using an naïve Bayes: U i,j = arg max U i,j J j=1 P C = 1 S i,j 37
38 Follower/followee recommendation U i,j = arg max U i,j J j=1 P C = 1 S i,j As P C = 1 S i,j is a strictly increasing function, we have: U i,j = arg max U i,j J j=1 S i,j which is the set of users with the highest S i,j with i Let s see how good it is compare to 2 approaches: friendship similarity random 38
39 Results 39
40 Conclusion Related pairs likely to share visually similar images (detectable by some multimedia signal processing techniques) Discovered useful formulations of user image sharing that is critical for analytics development Follower/followee recommendation can be based on similarity in images features 40
41 Future works How s about other social networks? are they following the same distribution and trend? What s about other image processing techniques? How to make a scalable big data system (as well as storage)? To be discussed in the next topic 41
42 An Example Work in Multimedia Big Data System A Cloud-assisted Framework for Bag-of-Features Tagging in Social Networks, IEEE 4th Symposium on Network Cloud Computing and Applications (NCCA), Jun by Z. Jie, M. Cheung and J. She 42
43 Recalled Proposed Methods: BoFT Extract features, and assign a non-user generated label to each image to represent its features which part is the most computationally intensives? 43
44 Proposed Methods: BoFT k-means: computationally expensive! Can we use multiple machines (e.g., VMs on Amazon) to speed up? 44
45 Cloud-Assisted Framework for BoFT Using MapReduce to handle the k-means Map: classification step; data parallel over data point Reduce: recomputed means; data parallel over centers 45
46 Experimental Results Speedup θ: execution time on stand-alone machine(ts) divided by the one on cloud (Tc). θ=ts/tc if θ=1: no difference using cloud or stand-alone machine. if θ>1: cloud is faster if θ<1: stand-alone machine is faster 3 dimensions of the study: no. of VMs k (no. of clusters in k-means) no. of images involved 46
47 Experimental Results Dataset: Skyrock (same as the previous paper) User can share images and other types of content follower/followee shared content 47
48 Experimental Results # of images has to be large to make cloud faster It is common for a social network to have >120k images 48
49 Experimental Results # of VMs has to be large to make cloud faster 49
50 Experimental Results Cloud is better when k (# of unique label) is larger 50
51 Experimental Results Scaleup defines the ability of an m-times larger system to perform on m-times larger datasets Tm: the execution time for m VMs to perform on m- times larger datasets. Tm # of images # of VMs 1 10k k k k k k 6 51
52 Experimental Results Scaleup drops slowly with m, which means that more data and VMs only reduce the efficiency a little A good system should have a Tm closes to 1, means that increase the data and VMs will not increase much overhead (e.g., communication among VMs) 52
53 Conclusion A cloud-assisted framework improves the efficiency of BoFT for computing the analytics However, cloud-assisted frameworks only helps with large # of images, VMs. The system design and resources optimization of the cloud platform is driven by the mechanism of the multimedia big data analytics. 53
54 -End of Lecture 9 Questions / Comments? 54
ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She
ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4 Prof. James She james.she@ust.hk 1 Selected Works of Activity 4 2 Selected Works of Activity 4 3 Last lecture 4 Mid-term
More informationELEC6910Q Analytics and Systems for Social Media and Big Data Applications
ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Tutorial 1 [Visualization and Data Analytic] Prof. James She james.she@ust.hk 1 Outcomes of this tutorial 1. Basic Task: Visualization
More informationREVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK
REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK 1 Dr.R.Kousalya, 2 T.Sindhupriya 1 Research Supervisor, Professor & Head, Department of Computer Applications, Dr.N.G.P Arts and Science College, Coimbatore
More informationAn Analytic System for User Gender Identification through User Shared Images
0 An Analytic System for User Gender Identification through User Shared Images Ming Cheung, HKUST-NIE Social Media Lab James She, HKUST-NIE Social Media Lab Many social media applications, such as recommendation,
More informationExpectation Maximization: Inferring model parameters and class labels
Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/27/2017 Jumble of unlabeled images HISTOGRAM blue
More informationEpilog: Further Topics
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Epilog: Further Topics Lecture: Prof. Dr. Thomas
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationDepartment of Computer Science San Marcos, TX Report Number TXSTATE-CS-TR Clustering in the Cloud. Xuan Wang
Department of Computer Science San Marcos, TX 78666 Report Number TXSTATE-CS-TR-2010-24 Clustering in the Cloud Xuan Wang 2010-05-05 !"#$%&'()*+()+%,&+!"-#. + /+!"#$%&'()*+0"*-'(%,1$+0.23%(-)+%-+42.--3+52367&.#8&+9'21&:-';
More informationSemi-supervised Learning
Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty
More information2.3 Algorithms Using Map-Reduce
28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 25: Parallel Databases CSE 344 - Winter 2013 1 Announcements Webquiz due tonight last WQ! J HW7 due on Wednesday HW8 will be posted soon Will take more hours
More informationMachine Learning Practice and Theory
Machine Learning Practice and Theory Day 9 - Feature Extraction Govind Gopakumar IIT Kanpur 1 Prelude 2 Announcements Programming Tutorial on Ensemble methods, PCA up Lecture slides for usage of Neural
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationLink Prediction for Social Network
Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue
More informationGraphCEP Real-Time Data Analytics Using Parallel Complex Event and Graph Processing
Institute of Parallel and Distributed Systems () Universitätsstraße 38 D-70569 Stuttgart GraphCEP Real-Time Data Analytics Using Parallel Complex Event and Graph Processing Ruben Mayer, Christian Mayer,
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationScalable Tools - Part I Introduction to Scalable Tools
Scalable Tools - Part I Introduction to Scalable Tools Adisak Sukul, Ph.D., Lecturer, Department of Computer Science, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/mbds2018/ Scalable Tools session
More informationAutomated Tagging for Online Q&A Forums
1 Automated Tagging for Online Q&A Forums Rajat Sharma, Nitin Kalra, Gautam Nagpal University of California, San Diego, La Jolla, CA 92093, USA {ras043, nikalra, gnagpal}@ucsd.edu Abstract Hashtags created
More informationCS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014
CS15-319 / 15-619 Cloud Computing Recitation 3 September 9 th & 11 th, 2014 Overview Last Week s Reflection --Project 1.1, Quiz 1, Unit 1 This Week s Schedule --Unit2 (module 3 & 4), Project 1.2 Questions
More informationPart-based and local feature models for generic object recognition
Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationTexture. Texture is a description of the spatial arrangement of color or intensities in an image or a selected region of an image.
Texture Texture is a description of the spatial arrangement of color or intensities in an image or a selected region of an image. Structural approach: a set of texels in some regular or repeated pattern
More informationTA Section: Problem Set 4
TA Section: Problem Set 4 Outline Discriminative vs. Generative Classifiers Image representation and recognition models Bag of Words Model Part-based Model Constellation Model Pictorial Structures Model
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationPSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department
More informationInformation Visualisation
Information Visualisation Computer Animation and Visualisation Lecture 18 Taku Komura tkomura@ed.ac.uk Institute for Perception, Action & Behaviour School of Informatics 1 Overview Information Visualisation
More informationA Systems View of Large- Scale 3D Reconstruction
Lecture 23: A Systems View of Large- Scale 3D Reconstruction Visual Computing Systems Goals and motivation Construct a detailed 3D model of the world from unstructured photographs (e.g., Flickr, Facebook)
More informationTime Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix
Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang Department of Computer Science, University of Houston, USA Abstract. We study the serial and parallel
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationAn Efficient Methodology for Image Rich Information Retrieval
An Efficient Methodology for Image Rich Information Retrieval 56 Ashwini Jaid, 2 Komal Savant, 3 Sonali Varma, 4 Pushpa Jat, 5 Prof. Sushama Shinde,2,3,4 Computer Department, Siddhant College of Engineering,
More informationIntensification Of Dark Mode Images Using FFT And Bilog Transformation
Intensification Of Dark Mode Images Using FFT And Bilog Transformation Yeleshetty Dhruthi 1, Shilpa A 2, Sherine Mary R 3 Final year Students 1, 2, Assistant Professor 3 Department of CSE, Dhanalakshmi
More informationIntensity Transformations and Spatial Filtering
77 Chapter 3 Intensity Transformations and Spatial Filtering Spatial domain refers to the image plane itself, and image processing methods in this category are based on direct manipulation of pixels in
More informationPRIVACY PRESERVING CONTENT BASED SEARCH OVER OUTSOURCED IMAGE DATA
PRIVACY PRESERVING CONTENT BASED SEARCH OVER OUTSOURCED IMAGE DATA Supriya Pentewad 1, Siddhivinayak Kulkarni 2 1 Department of Computer Engineering. MIT College of Engineering, Pune, India 2 Department
More informationBIG DATA SCIENTIST Certification. Big Data Scientist
BIG DATA SCIENTIST Certification Big Data Scientist Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big Data. To obtain a certification,
More informationAlgorithms of Scientific Computing
Algorithms of Scientific Computing Overview and General Remarks Michael Bader Technical University of Munich Summer 2017 Classification of the Lecture Who is Who? Students of Informatics: Informatics Bachelor
More information1 More configuration model
1 More configuration model In the last lecture, we explored the definition of the configuration model, a simple method for drawing networks from the ensemble, and derived some of its mathematical properties.
More informationDetecting and Analyzing Communities in Social Network Graphs for Targeted Marketing
Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,
More informationSublinear Models for Streaming and/or Distributed Data
Sublinear Models for Streaming and/or Distributed Data Qin Zhang Guest lecture in B649 Feb. 3, 2015 1-1 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index
More informationKeywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationSegmentation Computer Vision Spring 2018, Lecture 27
Segmentation http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 218, Lecture 27 Course announcements Homework 7 is due on Sunday 6 th. - Any questions about homework 7? - How many of you have
More informationTwitter data Analytics using Distributed Computing
Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE
More informationIntroduction to Database Systems CSE 414
Introduction to Database Systems CSE 414 Lecture 24: Parallel Databases CSE 414 - Spring 2015 1 Announcements HW7 due Wednesday night, 11 pm Quiz 7 due next Friday(!), 11 pm HW8 will be posted middle of
More informationIdentifying Layout Classes for Mathematical Symbols Using Layout Context
Rochester Institute of Technology RIT Scholar Works Articles 2009 Identifying Layout Classes for Mathematical Symbols Using Layout Context Ling Ouyang Rochester Institute of Technology Richard Zanibbi
More informationIJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:
Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,
More informationImproving Recognition through Object Sub-categorization
Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,
More informationANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining
ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, 2014 ISSN 2278 5485 EISSN 2278 5477 discovery Science Comparative Study of Classification Algorithms Using Data Mining Akhila
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationAn Introduction to Pattern Recognition
An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationAnnouncements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing
Announcements Database Systems CSE 414 HW4 is due tomorrow 11pm Lectures 18: Parallel Databases (Ch. 20.1) 1 2 Why compute in parallel? Multi-cores: Most processors have multiple cores This trend will
More informationCS6670: Computer Vision
CS6670: Computer Vision Noah Snavely Lecture 16: Bag-of-words models Object Bag of words Announcements Project 3: Eigenfaces due Wednesday, November 11 at 11:59pm solo project Final project presentations:
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationTIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz Nov 10, 2016 Class Announcements n Database Assignment 2 posted n Due 11/22 The Database Approach to Data Management The Final Database Design
More informationClassifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped
More informationINTRODUCTION TO IMAGE PROCESSING (COMPUTER VISION)
INTRODUCTION TO IMAGE PROCESSING (COMPUTER VISION) Revision: 1.4, dated: November 10, 2005 Tomáš Svoboda Czech Technical University, Faculty of Electrical Engineering Center for Machine Perception, Prague,
More informationMachine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016
Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the
More informationFinal Exam Study Guide
Final Exam Study Guide Exam Window: 28th April, 12:00am EST to 30th April, 11:59pm EST Description As indicated in class the goal of the exam is to encourage you to review the material from the course.
More informationLec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA
Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA Zhu Li Dept of CSEE,
More informationConvex and Distributed Optimization. Thomas Ropars
>>> Presentation of this master2 course Convex and Distributed Optimization Franck Iutzeler Jérôme Malick Thomas Ropars Dmitry Grishchenko from LJK, the applied maths and computer science laboratory and
More informationMATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA
Journal of Computer Science, 9 (5): 534-542, 2013 ISSN 1549-3636 2013 doi:10.3844/jcssp.2013.534.542 Published Online 9 (5) 2013 (http://www.thescipub.com/jcs.toc) MATRIX BASED INDEXING TECHNIQUE FOR VIDEO
More informationSpatial biosurveillance
Spatial biosurveillance Authors of Slides Andrew Moore Carnegie Mellon awm@cs.cmu.edu Daniel Neill Carnegie Mellon d.neill@cs.cmu.edu Slides and Software and Papers at: http://www.autonlab.org awm@cs.cmu.edu
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationExecutive Summary. The Nokia AirFrame data center solution
Executive Summary The Nokia AirFrame data center solution Centralized and distributed capabilities for the telco cloud More data has been created in the past two years alone than in the entire history
More information1 (eagle_eye) and Naeem Latif
1 CS614 today quiz solved by my campus group these are just for idea if any wrong than we don t responsible for it Question # 1 of 10 ( Start time: 07:08:29 PM ) Total Marks: 1 As opposed to the outcome
More informationFast Fuzzy Clustering of Infrared Images. 2. brfcm
Fast Fuzzy Clustering of Infrared Images Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E.
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationProgramming assignment 3 Mean-shift
Programming assignment 3 Mean-shift 1 Basic Implementation The Mean Shift algorithm clusters a d-dimensional data set by associating each point to a peak of the data set s probability density function.
More informationEngineering Data Intensive Scalable Systems
Engineering Data Intensive Scalable Systems Introduction Internet services companies such as Google, Yahoo!, Amazon, and Facebook, have pioneered systems that have achieved unprecedented scale while still
More informationAn Introduction to Content Based Image Retrieval
CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and
More informationMa/CS 6b Class 26: Art Galleries and Politicians
Ma/CS 6b Class 26: Art Galleries and Politicians By Adam Sheffer The Art Gallery Problem Problem. We wish to place security cameras at a gallery, such that they cover it completely. Every camera can cover
More informationShort Survey on Static Hand Gesture Recognition
Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationGraph drawing in spectral layout
Graph drawing in spectral layout Maureen Gallagher Colleen Tygh John Urschel Ludmil Zikatanov Beginning: July 8, 203; Today is: October 2, 203 Introduction Our research focuses on the use of spectral graph
More informationThe Future of High Performance Computing
The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer
More informationEdge Histogram Descriptor, Geometric Moment and Sobel Edge Detector Combined Features Based Object Recognition and Retrieval System
Edge Histogram Descriptor, Geometric Moment and Sobel Edge Detector Combined Features Based Object Recognition and Retrieval System Neetesh Prajapati M. Tech Scholar VNS college,bhopal Amit Kumar Nandanwar
More informationHomework #4 Programming Assignment Due: 11:59 pm, November 4, 2018
CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this
More information2. CONNECTIVITY Connectivity
2. CONNECTIVITY 70 2. Connectivity 2.1. Connectivity. Definition 2.1.1. (1) A path in a graph G = (V, E) is a sequence of vertices v 0, v 1, v 2,..., v n such that {v i 1, v i } is an edge of G for i =
More informationContent Based Image Retrieval
Content Based Image Retrieval R. Venkatesh Babu Outline What is CBIR Approaches Features for content based image retrieval Global Local Hybrid Similarity measure Trtaditional Image Retrieval Traditional
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Distributed Machine Learning Week #9
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Distributed Machine Learning Week #9 Today Distributed computing for machine learning Background MapReduce/Hadoop & Spark Theory
More informationClassification and Detection in Images. D.A. Forsyth
Classification and Detection in Images D.A. Forsyth Classifying Images Motivating problems detecting explicit images classifying materials classifying scenes Strategy build appropriate image features train
More informationSocial, Information, and Routing Networks: Models, Algorithms, and Strategic Behavior
Social, Information, and Routing Networks: Models, Algorithms, and Strategic Behavior Who? Prof. Aris Anagnostopoulos Prof. Luciana S. Buriol Prof. Guido Schäfer What will We Cover? Topics: Network properties
More informationMultimedia Information Systems
Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 6: Text Information Retrieval 1 Digital Video Library Meta-Data Meta-Data Similarity Similarity Search Search Analog Video Archive
More informationCreating a Recommender System. An Elasticsearch & Apache Spark approach
Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused
More informationWhat We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server
What We Have Already Learned CSE 444: Database Internals Lectures 19-20 Parallel DBMSs Overall architecture of a DBMS Internals of query execution: Data storage and indexing Buffer management Query evaluation
More informationRobotics Programming Laboratory
Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car
More information/ Cloud Computing. Recitation 3 Sep 13 & 15, 2016
15-319 / 15-619 Cloud Computing Recitation 3 Sep 13 & 15, 2016 1 Overview Administrative Issues Last Week s Reflection Project 1.1, OLI Unit 1, Quiz 1 This Week s Schedule Project1.2, OLI Unit 2, Module
More informationExploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge
Exploiting Internal and External Semantics for the Using World Knowledge, 1,2 Nan Sun, 1 Chao Zhang, 1 Tat-Seng Chua 1 1 School of Computing National University of Singapore 2 School of Computer Science
More informationTensor Decomposition of Dense SIFT Descriptors in Object Recognition
Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia
More informationOverview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8
Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationThanks to Chris Bregler. COS 429: Computer Vision
Thanks to Chris Bregler COS 429: Computer Vision COS 429: Computer Vision Instructor: Thomas Funkhouser funk@cs.princeton.edu Preceptors: Ohad Fried, Xinyi Fan {ohad,xinyi}@cs.princeton.edu Web page: http://www.cs.princeton.edu/courses/archive/fall13/cos429/
More informationAN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES
AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES 1 RIMA TRI WAHYUNINGRUM, 2 INDAH AGUSTIEN SIRADJUDDIN 1, 2 Department of Informatics Engineering, University of Trunojoyo Madura,
More informationGeneral Instructions. Questions
CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These
More informationAn Improved Parallel Scalable K-means++ Massive Data Clustering Algorithm Based on Cloud Computing
An Improved Parallel Scalable K-means++ Massive Data Clustering Algorithm Based on Cloud Computing Shuzhi Nie Abstract Clustering is one of the most effective algorithms in data analysis and management.
More information