Knowledge Discovery and Data Mining 1 (KU)

Size: px
Start display at page:

Download "Knowledge Discovery and Data Mining 1 (KU)"

Transcription

1 Knowledge Discovery and Data Mining 1 (KU) Simon Walk IICM, TU Graz October 22, 2015 Simon Walk (IICM) KDDM1 October 22, / 11

2 KDDM 1 (KU) - Introduction Introduction Institute for Information Systems & Computer Media Inffeldgasse 16c/I Office: D simon.walk@tugraz.at Simon Walk Research Interests: Knowledge & Data Mining Social Network Analysis Semantic Web & Ontologies Dynamical Systems & Complex Networks Machine Learning Simon Walk (IICM) KDDM1 October 22, / 11

3 KDDM 1 (KU) - Introduction Course Context & Goals Why should you be interested in KDDM1 (KU)? To consolidate and reinforce your (theoretical) knowledge obtained in KDDM1 (VO) with practical hands-on experience. Helps a LOT for the final exam! Good preparation for KDDM2! Feel like a data scientist! If interested: Continue with Master Project or Master s Thesis Simon Walk (IICM) KDDM1 October 22, / 11

4 KDDM 1 (KU) - Organization Course Organization You have to 1. form small groups of up to two students. 2. choose one of two practical assignments. 3. work on your chosen assignment. 4. give two presentations (in english) on the progress and results of your assignment. After forming a group, send one to simon.walk@tugraz.at and include the names and student ids (Matrikelnummern) of the group. All s have to include [KDDM1] in the subject! Simon Walk (IICM) KDDM1 October 22, / 11

5 Project 1 - Crawling, Cleaning and Clustering Objective: Group (semantically) similar pages of a website according to their most relevant terms! Write a web-crawler to collect pages/documents that contain text. Clean the crawled pages from all markup languages and unwanted content (e.g., HTML, JavaScript, etc.). Calculate similarities between the pages (i.e., by calculating similarities between the TF-IDF Vectors for each page) Group similar pages (i.e., by using a clustering algorithm, such as k-means) Hint: Python, scikit-learn 1, SciPy 2 and NumPy 3 already provide you with most of the functionality required to solve this task! Simon Walk (IICM) KDDM1 October 22, / 11

6 Project 1 - Crawling, Cleaning and Clustering A word of warning: Be careful when crawling websites! Don t hammer the servers or you might risk getting banned! Either select smaller websites for crawling (complete crawl) or choose an appropriate sampling strategy for selecting the pages to analyze! Rule of thumb: Your datasets should consist of, at least, 1,000 pages! Simon Walk (IICM) KDDM1 October 22, / 11

7 Project 2 - Movie Recommender Objective: Recommend similar movies to users, using matrix factorization! Crawl or download 4 a movie-ratings dataset. Create/Extract the required utility matrix and minimize noise (e.g., subtract averages). Perform UV Decomposition to obtain U R n d and V R d m with d = 2 or d = 3. Plot and interpret findings. Hint: Python, scikit-learn, SciPy and NumPy already provide you with many of the functions and tools required to solve this task! 4 We suggest to use MovieLens 100k Simon Walk (IICM) KDDM1 October 22, / 11

8 Project Presentations Will take place after Partial Exam 2 & 3 on and For prepare a 5-minute presentation (strict) with 3 slides: First slide: Dataset Second slide: Experimental Setup Third slide: preliminary results For prepare a 10-minute presentation (strict) with 5 slides: First slide: Introduction/Motivation Second slide: Methodology Third slide: Experimental setup Fourth slide: Results Fifth slide: Discussion Simon Walk (IICM) KDDM1 October 22, / 11

9 Project Presentations Send the slides to as PDF until :59 for presentation 1 and :59 for presentation 2. Subject of the must include [KDDM1]. Note that presentations that take longer than 5 or 10 minutes will be interrupted and stopped! Grading for the KU depends on your presentation and your results! Simon Walk (IICM) KDDM1 October 22, / 11

10 Questions? Simon Walk (IICM) KDDM1 October 22, / 11

11 Thanks! Simon Walk (IICM) KDDM1 October 22, / 11

Introduction. Software Architecture VO/KU ( / ) Roman Kern. KTI, TU Graz

Introduction. Software Architecture VO/KU ( / ) Roman Kern. KTI, TU Graz Introduction Software Architecture VO/KU (707.023/707.024) Roman Kern KTI, TU Graz 2013-10-02 Roman Kern (KTI, TU Graz) Introduction 2013-10-02 1 / 32 Introduction Introduction Basic organisational information

More information

Databases 2 (VU) ( )

Databases 2 (VU) ( ) Databases 2 (VU) (707.030) Denis Helic KMI, TU Graz Oct 5, 2015 Denis Helic (KMI, TU Graz) Dbase2 Oct 5, 2015 1 / 33 Lecturer Name: Denis Helic Office: IWM (Know-Center), Inffeldgasse 13, 5th Floor, Room

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability

More information

Word Embeddings in Search Engines, Quality Evaluation. Eneko Pinzolas

Word Embeddings in Search Engines, Quality Evaluation. Eneko Pinzolas Word Embeddings in Search Engines, Quality Evaluation Eneko Pinzolas Neural Networks are widely used with high rate of success. But can we reproduce those results in IR? Motivation State of the art for

More information

: Semantic Web (2013 Fall)

: Semantic Web (2013 Fall) 03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet

More information

Tag-based Social Interest Discovery

Tag-based Social Interest Discovery Tag-based Social Interest Discovery Xin Li / Lei Guo / Yihong (Eric) Zhao Yahoo!Inc 2008 Presented by: Tuan Anh Le (aletuan@vub.ac.be) 1 Outline Introduction Data set collection & Pre-processing Architecture

More information

CIS192 Python Programming

CIS192 Python Programming CIS192 Python Programming Machine Learning in Python Robert Rand University of Pennsylvania October 22, 2015 Robert Rand (University of Pennsylvania) CIS 192 October 22, 2015 1 / 18 Outline 1 Machine Learning

More information

Searching the Deep Web

Searching the Deep Web Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index

More information

Pre-Requisites: CS2510. NU Core Designations: AD

Pre-Requisites: CS2510. NU Core Designations: AD DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification

More information

Exam IST 441 Spring 2011

Exam IST 441 Spring 2011 Exam IST 441 Spring 2011 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.

More information

Searching the Deep Web

Searching the Deep Web Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation

More information

Programming for Engineers in Python

Programming for Engineers in Python Programming for Engineers in Python Lecture 13: Shit Happens Autumn 2011-12 1 Lecture 12: Highlights Dynamic programming Overlapping subproblems Optimal structure Memoization Fibonacci Evaluating trader

More information

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may

More information

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

Clustering algorithms and autoencoders for anomaly detection

Clustering algorithms and autoencoders for anomaly detection Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms

More information

Exercise 4. AMTH/CPSC 445a/545a - Fall Semester October 30, 2017

Exercise 4. AMTH/CPSC 445a/545a - Fall Semester October 30, 2017 Exercise 4 AMTH/CPSC 445a/545a - Fall Semester 2016 October 30, 2017 Problem 1 Compress your solutions into a single zip file titled assignment4.zip, e.g. for a student named Tom

More information

Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO

Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO INDEX Proposal Recap Implementation Evaluation Future Works Proposal Recap Keyword Visualizer (chrome

More information

Graph analytics approach to analyse Enterprise Architecture models

Graph analytics approach to analyse Enterprise Architecture models Nikhitha Rajashekar nikhita.rajashekar@rwth-aachen.de Graph analytics approach to analyse Enterprise Architecture models Master Thesis Proposal Supervisor: Simon Hacks Overview 1. Enterprise Architecture

More information

CASE STUDY ACTIVITY TUTORIAL

CASE STUDY ACTIVITY TUTORIAL Module 1 Making Sense of Unstructured Data CASE STUDY ACTIVITY TUTORIAL CASE STUDY 2 Spectral Clustering New Stories CASE STUDY ACTIVITY TUTORIAL CASE STUDY 2 Spectral Clustering New Stories Faculty: Stefanie

More information

HANDS ON DATA MINING. By Amit Somech. Workshop in Data-science, March 2016

HANDS ON DATA MINING. By Amit Somech. Workshop in Data-science, March 2016 HANDS ON DATA MINING By Amit Somech Workshop in Data-science, March 2016 AGENDA Before you start TextEditors Some Excel Recap Setting up Python environment PIP ipython Scientific computation in Python

More information

TABLE OF CONTENTS CHAPTER NO. TITLE PAGENO. LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION

TABLE OF CONTENTS CHAPTER NO. TITLE PAGENO. LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION vi TABLE OF CONTENTS ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION iii xii xiii xiv 1 INTRODUCTION 1 1.1 WEB MINING 2 1.1.1 Association Rules 2 1.1.2 Association Rule Mining 3 1.1.3 Clustering

More information

Information Retrieval

Information Retrieval Information Retrieval Course presentation João Magalhães 1 Relevance vs similarity Multimedia documents Information retrieval application Query Documents Information side User side What is the best [search

More information

Fall Principles of Knowledge Discovery in Databases. University of Alberta

Fall Principles of Knowledge Discovery in Databases. University of Alberta Principles of Knowledge Discovery in Databases Fall 1999 Dr. Osmar R. Zaïane 2 1 Class and Office Hours Class: Mondays, Wednesdays and Fridays from 10:00 to 10:50 Office Hours: Tuesdays from 11:00 to 11:55

More information

DOWNLOAD OR READ : LEARN PYTHON 3 THE HARD WAY PDF EBOOK EPUB MOBI

DOWNLOAD OR READ : LEARN PYTHON 3 THE HARD WAY PDF EBOOK EPUB MOBI DOWNLOAD OR READ : LEARN PYTHON 3 THE HARD WAY PDF EBOOK EPUB MOBI Page 1 Page 2 learn python 3 the hard way learn python 3 the pdf learn python 3 the hard way I'm reading your book, Learn Python the Hard

More information

Exercise 2. AMTH/CPSC 445a/545a - Fall Semester September 21, 2017

Exercise 2. AMTH/CPSC 445a/545a - Fall Semester September 21, 2017 Exercise 2 AMTH/CPSC 445a/545a - Fall Semester 2016 September 21, 2017 Problem 1 Compress your solutions into a single zip file titled assignment2.zip, e.g. for a student named

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

Continue to next page.

Continue to next page. Obtaining your Bunker Hill Community College Network / Web Advisor Username and Password If you have never logged on to Web Advisor, then the first thing you will want to do is obtain your username/user

More information

Using Semantic Similarity in Crawling-based Web Application Testing. (National Taiwan Univ.)

Using Semantic Similarity in Crawling-based Web Application Testing. (National Taiwan Univ.) Using Semantic Similarity in Crawling-based Web Application Testing Jun-Wei Lin Farn Wang Paul Chu (UC-Irvine) (National Taiwan Univ.) (QNAP, Inc) Crawling-based Web App Testing the web app under test

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

Python With Data Science

Python With Data Science Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,

More information

Latent Semantic Analysis. sci-kit learn. Vectorizing text. Document-term matrix

Latent Semantic Analysis. sci-kit learn. Vectorizing text. Document-term matrix Latent Semantic Analysis Latent Semantic Analysis (LSA) is a framework for analyzing text using matrices Find relationships between documents and terms within documents Used for document classification,

More information

Order Preserving Triclustering Algorithm. (Version1.0)

Order Preserving Triclustering Algorithm. (Version1.0) Order Preserving Triclustering Algorithm User Manual (Version1.0) Alain B. Tchagang alain.tchagang@nrc-cnrc.gc.ca Ziying Liu ziying.liu@nrc-cnrc.gc.ca Sieu Phan sieu.phan@nrc-cnrc.gc.ca Fazel Famili fazel.famili@nrc-cnrc.gc.ca

More information

Master & Doctor of Philosophy Programs in Computer Science

Master & Doctor of Philosophy Programs in Computer Science Master & Doctor of Philosophy Programs in Computer Science Research Fields Pattern Recognition Data Analysis Internet of Things and Network Communication Machine Learning Web Semantic and Ontology For

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Introduction to MapReduce (cont.)

Introduction to MapReduce (cont.) Introduction to MapReduce (cont.) Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com USC INF 553 Foundations and Applications of Data Mining (Fall 2018) 2 MapReduce: Summary USC INF 553 Foundations

More information

Course Overview, Python Basics

Course Overview, Python Basics CS 1110: Introduction to Computing Using Python Lecture 1 Course Overview, Python Basics [Andersen, Gries, Lee, Marschner, Van Loan, White] Interlude: Why learn to program? (which is subtly distinct from,

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Hebei University of Technology A Text-Mining-based Patent Analysis in Product Innovative Process

Hebei University of Technology A Text-Mining-based Patent Analysis in Product Innovative Process A Text-Mining-based Patent Analysis in Product Innovative Process Liang Yanhong, Tan Runhua Abstract Hebei University of Technology Patent documents contain important technical knowledge and research results.

More information

In this third unit about jobs in the Information Technology field we will speak about software development

In this third unit about jobs in the Information Technology field we will speak about software development In this third unit about jobs in the Information Technology field we will speak about software development 1 The IT professionals involved in the development of software applications can be generically

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

scikit-learn (Machine Learning in Python)

scikit-learn (Machine Learning in Python) scikit-learn (Machine Learning in Python) (PB13007115) 2016-07-12 (PB13007115) scikit-learn (Machine Learning in Python) 2016-07-12 1 / 29 Outline 1 Introduction 2 scikit-learn examples 3 Captcha recognize

More information

Web Science (VU) ( )

Web Science (VU) ( ) Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz March 5, 2018 Elisabeth Lex (ISDS, TU Graz) WebSci March 5, 2018 1 / 56 Lecturer Name: Elisabeth Lex Office: ISDS, Inffeldgasse 13, 5th Floor, Room

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Exercises: Instructions and Advice

Exercises: Instructions and Advice Instructions Exercises: Instructions and Advice The exercises in this course are primarily practical programming tasks that are designed to help the student master the intellectual content of the subjects

More information

Master of Business Administration Program and Master of Sciences Program in Strategic Information Technology Management (International Program)

Master of Business Administration Program and Master of Sciences Program in Strategic Information Technology Management (International Program) Naresuan University 263 Master of Business Administration Program and Master of Sciences Program in Strategic Management (International Program) Research Focus Strategic Management Knowledge Management

More information

MATH 829: Introduction to Data Mining and Analysis Overview

MATH 829: Introduction to Data Mining and Analysis Overview 1/13 MATH 829: Introduction to Data Mining and Analysis Overview Dominique Guillot Departments of Mathematical Sciences University of Delaware February 8, 2016 Supervised vs unsupervised learning 2/13

More information

USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM. Cédric Mesnage Southampton Solent University United Kingdom

USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM. Cédric Mesnage Southampton Solent University United Kingdom USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM Cédric Mesnage Southampton Solent University United Kingdom Abstract Musicbrainz is a crowd-sourced database of music metadata. The level 6 class of Data

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Exam IST 441 Spring 2013

Exam IST 441 Spring 2013 Exam IST 441 Spring 2013 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Computer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm

Computer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm Computer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm Name: Student Id Number: 1. This is a closed book exam. 2. Please answer all questions. 3. There are a total of 40 questions.

More information

Objective: To learn meaning and concepts of programming. Outcome: By the end of this students should be able to describe the meaning of programming

Objective: To learn meaning and concepts of programming. Outcome: By the end of this students should be able to describe the meaning of programming 30 th September 2018 Objective: To learn meaning and concepts of programming Outcome: By the end of this students should be able to describe the meaning of programming Section 1: What is a programming

More information

Parts of Speech, Named Entity Recognizer

Parts of Speech, Named Entity Recognizer Parts of Speech, Named Entity Recognizer Artificial Intelligence @ Allegheny College Janyl Jumadinova November 8, 2018 Janyl Jumadinova Parts of Speech, Named Entity Recognizer November 8, 2018 1 / 25

More information

Celebrity Identification and Recognition in Videos. An application of semi-supervised learning and multiclass classification

Celebrity Identification and Recognition in Videos. An application of semi-supervised learning and multiclass classification Celebrity Identification and Recognition in Videos An application of semi-supervised learning and multiclass classification Contents Celebrity Identification and Recognition in Videos... 1 Aim... 3 Motivation...

More information

Lecture 1 Getting Started with SAS

Lecture 1 Getting Started with SAS SAS for Data Management, Analysis, and Reporting Lecture 1 Getting Started with SAS Portions reproduced with permission of SAS Institute Inc., Cary, NC, USA Goals of the course To provide skills required

More information

TERM 3 GRADE 10 PROJECT

TERM 3 GRADE 10 PROJECT TERM 3 GRADE 10 PROJECT Python TEACHER S GUIDELINES Picture 1 Picture 2 IMPORTANT NOTICE Dear teachers, in the case that your school is facing any software issues with Python or PyCharm, please ensure

More information

Scholarly Big Data: Leverage for Science

Scholarly Big Data: Leverage for Science Scholarly Big Data: Leverage for Science C. Lee Giles The Pennsylvania State University University Park, PA, USA giles@ist.psu.edu http://clgiles.ist.psu.edu Funded in part by NSF, Allen Institute for

More information

Lotus IT Hub. Module-1: Python Foundation (Mandatory)

Lotus IT Hub. Module-1: Python Foundation (Mandatory) Module-1: Python Foundation (Mandatory) What is Python and history of Python? Why Python and where to use it? Discussion about Python 2 and Python 3 Set up Python environment for development Demonstration

More information

Introduction to Programming

Introduction to Programming Introduction to Programming G. Bakalli March 8, 2017 G. Bakalli Introduction to Programming March 8, 2017 1 / 33 Outline 1 Programming in Finance 2 Types of Languages Interpreters Compilers 3 Programming

More information

ARTIFICIAL INTELLIGENCE AND PYTHON

ARTIFICIAL INTELLIGENCE AND PYTHON ARTIFICIAL INTELLIGENCE AND PYTHON DAY 1 STANLEY LIANG, LASSONDE SCHOOL OF ENGINEERING, YORK UNIVERSITY WHAT IS PYTHON An interpreted high-level programming language for general-purpose programming. Python

More information

DBNsim. Giorgio Giuffrè. 0 Abstract How to run it on your machine How to contribute... 2

DBNsim. Giorgio Giuffrè. 0 Abstract How to run it on your machine How to contribute... 2 DBNsim Giorgio Giuffrè Contents 0 Abstract 2 0.1 How to run it on your machine................... 2 0.2 How to contribute.......................... 2 1 Installing DBNsim 2 1.1 Requirements.............................

More information

Eagle Eye. Sommersemester 2017 Big Data Science Praktikum. Zhenyu Chen - Wentao Hua - Guoliang Xue - Bernhard Fabry - Daly

Eagle Eye. Sommersemester 2017 Big Data Science Praktikum. Zhenyu Chen - Wentao Hua - Guoliang Xue - Bernhard Fabry - Daly Eagle Eye Sommersemester 2017 Big Data Science Praktikum Zhenyu Chen - Wentao Hua - Guoliang Xue - Bernhard Fabry - Daly 1 Sommersemester Agenda 2009 Brief Introduction Pre-processiong of dataset Front-end

More information

Assessment for all units is ongoing and continuous consisting of tests, assignments and reports. Most units have a final two-hour examination.

Assessment for all units is ongoing and continuous consisting of tests, assignments and reports. Most units have a final two-hour examination. Diploma of Computing Course Outline (T3, 2017) Campus Intake CRICOS Course Duration Teaching Methods Assessment Course Structure Units Melbourne Burwood Campus / Jakarta Campus, Indonesia March, June,

More information

Python for. Data Science. by Luca Massaron. and John Paul Mueller

Python for. Data Science. by Luca Massaron. and John Paul Mueller Python for Data Science by Luca Massaron and John Paul Mueller Table of Contents #»» *» «»>»»» Introduction 1 About This Book 1 Foolish Assumptions 2 Icons Used in This Book 3 Beyond the Book 4 Where to

More information

Eight units must be completed and passed to be awarded the Diploma.

Eight units must be completed and passed to be awarded the Diploma. Diploma of Computing Course Outline Campus Intake CRICOS Course Duration Teaching Methods Assessment Course Structure Units Melbourne Burwood Campus / Jakarta Campus, Indonesia March, June, October 022638B

More information

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER CS6007-INFORMATION RETRIEVAL Regulation 2013 Academic Year 2018

More information

Homework 6: Pose Tracking EE267 Virtual Reality 2018

Homework 6: Pose Tracking EE267 Virtual Reality 2018 Homework 6: Pose Tracking EE267 Virtual Reality 218 Due: 5/17/218, 11:59pm Instructions Students should use the Arduino environment and JavaScript for this assignment, building on top of the provided starter

More information

DIGIT.B4 Big Data PoC

DIGIT.B4 Big Data PoC DIGIT.B4 Big Data PoC RTD Health papers D02.02 Technological Architecture Table of contents 1 Introduction... 5 2 Methodological Approach... 6 2.1 Business understanding... 7 2.2 Data linguistic understanding...

More information

Course Syllabus. Course Information

Course Syllabus. Course Information Course Syllabus Course Information Course: MIS 6V99 Special Topics Programming for Data Science Section: 5U1 Term: Summer 2017 Meets: Friday, 6:00 pm to 10:00 pm, JSOM 2.106 Note: Beginning Fall 2017,

More information

From Web Page Storage to Living Web Archives Thomas Risse

From Web Page Storage to Living Web Archives Thomas Risse From Web Page Storage to Living Web Archives Thomas Risse JISC, the DPC and the UK Web Archiving Consortium Workshop British Library, London, 21.7.2009 1 Agenda Web Crawlingtoday& Open Issues LiWA Living

More information

Design of a Social Networking Analysis and Information Logger Tool

Design of a Social Networking Analysis and Information Logger Tool Design of a Social Networking Analysis and Information Logger Tool William Gauvin and Benyuan Liu Department of Computer Science University of Massachusetts Lowell {wgauvin,bliu}@cs.uml.edu Abstract. This

More information

k-means Clustering (pp )

k-means Clustering (pp ) Notation: Means pencil-and-paper QUIZ Means coding QUIZ k-means Clustering (pp. 170-183) Explaining the intialization and iterations of k-means clustering algorithm: Let us understand the mechanics of

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

Captivating Movies! Getting Started with Captivate

Captivating Movies! Getting Started with Captivate Captivating Movies! Getting Started with Captivate Welcome to Getting Started with Captivate. In this tutorial you will learn to import a PowerPoint file into a Captivate Project. Then you will prepare

More information

MS in Computer Sciences & MS in Software Engineering

MS in Computer Sciences & MS in Software Engineering SUKKUR INSTITUTE OF BUSINESS ADMINISTRATION Merit-Quality-Excellence Schema of Studies for MS in Computer Sciences & MS in Software Engineering (2016-2017) DEPARTMENT OF COMPUTER SCIENCE Schema of Studies

More information

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as

More information

Data Science Course Content

Data Science Course Content CHAPTER 1: INTRODUCTION TO DATA SCIENCE Data Science Course Content What is the need for Data Scientists Data Science Foundation Business Intelligence Data Analysis Data Mining Machine Learning Difference

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression Goals: To open up the black-box of scikit-learn and implement regression models. To investigate how adding polynomial

More information

LECTURE 7: STUDENT REQUESTED TOPICS

LECTURE 7: STUDENT REQUESTED TOPICS 1 LECTURE 7: STUDENT REQUESTED TOPICS Introduction to Scientific Python, CME 193 Feb. 20, 2014 Please download today s exercises from: web.stanford.edu/~ermartin/teaching/cme193-winter15 Eileen Martin

More information

Using the Force of Python and SAS Viya on Star Wars Fan Posts

Using the Force of Python and SAS Viya on Star Wars Fan Posts SESUG Paper BB-170-2017 Using the Force of Python and SAS Viya on Star Wars Fan Posts Grace Heyne, Zencos Consulting, LLC ABSTRACT The wealth of information available on the Internet includes useful and

More information

Intro to Python & Programming. C-START Python PD Workshop

Intro to Python & Programming. C-START Python PD Workshop Don t just buy a new video game, make one. Don t just download the latest app, help design it. Don t just play on your phone, program it. No one is born a computer scientist, but with a little hard work

More information

ML 프로그래밍 ( 보충 ) Scikit-Learn

ML 프로그래밍 ( 보충 ) Scikit-Learn ML 프로그래밍 ( 보충 ) Scikit-Learn 2017.5 Scikit-Learn? 특징 a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (NumPy, SciPy, matplotlib).

More information

Moving to Convio CMS. Presented by Scott Williamson October 22, Convio, Inc.

Moving to Convio CMS. Presented by Scott Williamson October 22, Convio, Inc. Moving to Convio CMS Presented by Scott Williamson October 22, 2008 Objectives As an outcome of this session, we will provide you with an understanding of: What s involved in moving to Convio CMS Your

More information

Website Name. Project Code: # SEO Recommendations Report. Version: 1.0

Website Name. Project Code: # SEO Recommendations Report. Version: 1.0 Website Name Project Code: #10001 Version: 1.0 DocID: SEO/site/rec Issue Date: DD-MM-YYYY Prepared By: - Owned By: Rave Infosys Reviewed By: - Approved By: - 3111 N University Dr. #604 Coral Springs FL

More information

Data Mining Practical Machine Learning Tools And Techniques With Java Implementations The Morgan Kaufmann Series In Data Management Systems

Data Mining Practical Machine Learning Tools And Techniques With Java Implementations The Morgan Kaufmann Series In Data Management Systems Data Mining Practical Machine Learning Tools And Techniques With Java Implementations The Morgan Kaufmann We have made it easy for you to find a PDF Ebooks without any digging. And by having access to

More information

Certified Data Science with Python Professional VS-1442

Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional Certified Data Science with Python Professional Certification Code VS-1442 Data science has become

More information

: Intro Programming for Scientists and Engineers Final Exam

: Intro Programming for Scientists and Engineers Final Exam Final Exam Page 1 of 6 600.112: Intro Programming for Scientists and Engineers Final Exam Peter H. Fröhlich phf@cs.jhu.edu December 20, 2012 Time: 40 Minutes Start here: Please fill in the following important

More information

COAP 3110 INTERACTIVE SITE DEVELOPMENT

COAP 3110 INTERACTIVE SITE DEVELOPMENT COAP 3110 INTERACTIVE SITE DEVELOPMENT http://wwwai.wu-wien.ac.at/~hahsler/webster/coap3110/ Instructor Michael Hahsler Tel. 31336/6081 0699 100 00 598 E-mail: hahsler@ai.wu-wien.ac.at 1 Course description

More information

OHJ-306x: Software Testing Introduction to the Course Project Part 1: General Information and Project phases 1 & 2: Unit testing

OHJ-306x: Software Testing Introduction to the Course Project Part 1: General Information and Project phases 1 & 2: Unit testing 1 OHJ-306x: Software Testing Introduction to the Course Project Part 1: General Information and Project phases 1 & 2: Unit testing Antti Jääskeläinen, leading course assistant Matti Vuori, course assistant

More information

Lecture 1. Course Overview, Python Basics

Lecture 1. Course Overview, Python Basics Lecture 1 Course Overview, Python Basics We Are Very Full! Lectures and Labs are at fire-code capacity We cannot add sections or seats to lectures You may have to wait until someone drops No auditors are

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

MSW Online Application Steps (walk-through)

MSW Online Application Steps (walk-through) MSW Online Application Steps (walk-through) Click here to visit the MSW Application page of the School of Social Work Website. October 1, 2018 January 5, 2019 9/28/18 1 Application Process Step 1: Apply

More information

Analyzing OpenCourseWare usage by means of social tagging

Analyzing OpenCourseWare usage by means of social tagging Analyzing OpenCourseWare usage by means of social tagging Julià Minguillón 1,2, Jordi Conesa 2 1 UOC UNESCO Chair in e Learning 2 Universitat Oberta de Catalunya Barcelona, Spain Table of contents Open

More information

A Survey On Data Mining Algorithm

A Survey On Data Mining Algorithm A Survey On Data Mining Algorithm Rohit Jacob Mathew 1 Sasi Rekha Sankar 1 Preethi Varsha. V 2 1 Dept. of Software Engg., 2 Dept. of Electronics & Instrumentation Engg. SRM University India Abstract This

More information

Conclusions. Chapter Summary of our contributions

Conclusions. Chapter Summary of our contributions Chapter 1 Conclusions During this thesis, We studied Web crawling at many different levels. Our main objectives were to develop a model for Web crawling, to study crawling strategies and to build a Web

More information

Research Computing with Python, Lecture 1

Research Computing with Python, Lecture 1 Research Computing with Python, Lecture 1 Ramses van Zon SciNet HPC Consortium November 4, 2014 Ramses van Zon (SciNet HPC Consortium)Research Computing with Python, Lecture 1 November 4, 2014 1 / 35 Introduction

More information

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements

More information