Data Analytics. Qualification Exam, May 18, am 12noon

Size: px
Start display at page:

Download "Data Analytics. Qualification Exam, May 18, am 12noon"

Transcription

1 CS220 Data Analytics Number assigned to you: Qualification Exam, May 18, am 12noon Note: DO NOT write any information related to your name or KAUST student ID. 1. There should be 12 pages including this cover page. 2. Closed-book exam. No books, notes, computers, phones, or internet access. 3. A calculator with basic functions is allowed. 4. If you need more room to work out your answer to a question, please use the back of the page and clearly indicate that we should look there. 5. You have 180 minutes. No extension will be given. 6. Good luck! Grading: (for instructor use only) Question Topic Max. Score Score 1 Local search 12 2 Constraint satisfaction problem 8 3 Principal component analysis 10 4 Data preprocessing 8 5 ROC curves and AUC 10 6 Counting 12 7 Maximum likelihood estimation 8 8 Feature selection 12 9 A* Search Decision tree 10 Total 100 1

2 1. (12 points) Local search Suppose you are given a problem and are asked to solve it by genetic algorithm. List six components of the genetic algorithm that need to be defined/specified before you can apply the genetic algorithm to solve the problem. 2

3 2. (8 points) Constraint satisfaction problem Please fill the following form for arc consistency check. Arc Examined Value deleted Note: Value deleted should be answered with the value and the corresponding node. 3

4 3. (10 points) Principal Component Analysis Given three data points in three-dimensional space, (1,1,1), (2,2,4) and (3,3,7). Please show how to use PCA to reduce the dimensionality of the data. 4

5 4. (8 points) Data Preprocessing You are given a classification dataset that consists of 8 data samples, each of which is represented by three features. Sample Feature 1 Feature 2 Feature 3 Label index S S S S S S S S Now you are going to solve a supervised learning problem on this dataset. Please normalize the training data to make sure each feature value after normalization is valued between 0 and 1. Please give the training data after normalization. Hint: training data may not necessarily be the entire data set. 5

6 5. (10 points) ROC curves and AUC Given a dataset that contains five data samples, each of which is represented by one feature. The feature values and the corresponding labels are given in the table below. Please draw the ROC curve for this dataset and calculate the AUC. (Hint: no need to smooth the ROC curve) Sample index Feature values Label S S S S S

7 6. (12 points) Counting (This question has THREE subquestions) Suppose you want to train a classification model, P(X Y), where X is the feature vector of length n (n features) and Y is the class label. Assume that each feature has two possible discrete values and there are three possible classes. a. (4 points) How many independent parameters do you need to train in order to directly learn P(X Y)? b. (4 points) If we use Naïve Bayes P(X Y), what assumption do you need to make? c. (4 points) If we use Naïve Bayes and suppose your assumption holds, how many independent parameters do you need to learn? 7

8 7. (8 points) Maximum likelihood estimation In DNA, also known as the Code of Life, there exist four different possible bases: adenine (abbreviated A), cytosine (C), guanine (G) and thymine (T). Now, you are given an organism which has a set of unknown DNA base frequencies. Let p A, p C, p G, and p T be those unknown frequencies. Assume that you obtain a strand of DNA and you want to infer the unknown frequencies. Let n A, n C, n G, n T be the corresponding number of bases that you observe for A, C, G and T. Please infer the maximum likelihood estimates of the unknown parameters p A, p C, p G, and p T. 8

9 8. (12 points) Feature Selection (This question has TWO subquestions) a. (6 points) What is the main difference between filter methods and wrapper methods for feature selection? b. (6 points) List the advantage and disadvantage of filter methods and wrapper methods for feature selection. 9

10 9. (10 points) A* Search Please use A* search to solve the following problem, where S is the starting node and G is the goal node. The heuristic function values for each node and the edge weights are known. Specify at each step: the nodes that have been expanded; the nodes in the queue; the next node selected at this step to be expanded; and the evaluation value, i.e., f, for this selected node Step Nodes expanded Nodes in queue Next node to expand f for the next node 1 None S S 10 Please fill the table by the standard A* search until the algorithm terminates. Please list the final path from S to G selected by the A* search and the final cost of the path below: 10

11 10. (10 points) Decision tree Consider the following training data and the following decision tree learned from this data using the ID3 algorithm (without any post-pruning). The last column is the class label. Show that the choice of the Wind attribute at the second level of the tree is correct, by showing that its information gain is superior to the alternative choices. The definition for information gain is Gain = Entropy(p) Entropy(i) 11

12 12

7 Techniques for Data Dimensionality Reduction

7 Techniques for Data Dimensionality Reduction 7 Techniques for Data Dimensionality Reduction Rosaria Silipo KNIME.com The 2009 KDD Challenge Prediction Targets: Churn (contract renewals), Appetency (likelihood to buy specific product), Upselling (likelihood

More information

CS 540: Introduction to Artificial Intelligence

CS 540: Introduction to Artificial Intelligence CS 540: Introduction to Artificial Intelligence Midterm Exam: 7:15-9:15 pm, October, 014 Room 140 CS Building CLOSED BOOK (one sheet of notes and a calculator allowed) Write your answers on these pages

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Topics In Feature Selection

Topics In Feature Selection Topics In Feature Selection CSI 5388 Theme Presentation Joe Burpee 2005/2/16 Feature Selection (FS) aka Attribute Selection Witten and Frank book Section 7.1 Liu site http://athena.csee.umbc.edu/idm02/

More information

Exam Advanced Data Mining Date: Time:

Exam Advanced Data Mining Date: Time: Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

The American University in Cairo Department of Computer Science & Engineering CSCI &09 Dr. KHALIL Exam-I Fall 2011

The American University in Cairo Department of Computer Science & Engineering CSCI &09 Dr. KHALIL Exam-I Fall 2011 The American University in Cairo Department of Computer Science & Engineering CSCI 106-07&09 Dr. KHALIL Exam-I Fall 2011 Last Name :... ID:... First Name:... Form I Section No.: EXAMINATION INSTRUCTIONS

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Counting Product Rule

Counting Product Rule Counting Product Rule Suppose a procedure can be broken down into a sequence of two tasks. If there are n 1 ways to do the first task and n 2 ways to do the second task, then there are n 1 * n 2 ways to

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

Classification. Slide sources:

Classification. Slide sources: Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline

More information

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Midterm Examination CS 540-2: Introduction to Artificial Intelligence Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State

More information

Bayes Classifiers and Generative Methods

Bayes Classifiers and Generative Methods Bayes Classifiers and Generative Methods CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Stages of Supervised Learning To

More information

Homework #6 (Constraint Satisfaction, Non-Deterministic Uncertainty and Adversarial Search) Out: 2/21/11 Due: 2/29/11 (at noon)

Homework #6 (Constraint Satisfaction, Non-Deterministic Uncertainty and Adversarial Search) Out: 2/21/11 Due: 2/29/11 (at noon) CS121 Introduction to Artificial Intelligence Winter 2011 Homework #6 (Constraint Satisfaction, Non-Deterministic Uncertainty and Adversarial Search) Out: 2/21/11 Due: 2/29/11 (at noon) How to complete

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 2:00pm-3:30pm, Tuesday, December 15th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

CS 540: Introduction to Artificial Intelligence

CS 540: Introduction to Artificial Intelligence CS 540: Introduction to Artificial Intelligence Final Exam: 12:25-2:25pm, December 17, 2014 Room 132 Noland CLOSED BOOK (two sheets of notes and a calculator allowed) Write your answers on these pages

More information

Knowledge-based systems in bioinformatics 1MB602. Exam

Knowledge-based systems in bioinformatics 1MB602. Exam Knowledge-based systems in bioinformatics MB602 Exam 2007-04-7 Contact under exam: Robin Andersson 08-47 66 86 Grading: 3: 20p 4: 27p 5: 36p Max: 45p General instructions:. Keep it short! 2. If anything

More information

To earn the extra credit, one of the following has to hold true. Please circle and sign.

To earn the extra credit, one of the following has to hold true. Please circle and sign. CS 188 Spring 2011 Introduction to Artificial Intelligence Practice Final Exam To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 3 or more hours on the

More information

CPSC 211, Sections : Data Structures and Implementations, Honors Final Exam May 4, 2001

CPSC 211, Sections : Data Structures and Implementations, Honors Final Exam May 4, 2001 CPSC 211, Sections 201 203: Data Structures and Implementations, Honors Final Exam May 4, 2001 Name: Section: Instructions: 1. This is a closed book exam. Do not use any notes or books. Do not confer with

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Machine Learning: Algorithms and Applications Mockup Examination

Machine Learning: Algorithms and Applications Mockup Examination Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 25, 2015 Today: Graphical models Bayes Nets: Inference Learning EM Readings: Bishop chapter 8 Mitchell

More information

Data Mining and Knowledge Discovery Practice notes 2

Data Mining and Knowledge Discovery Practice notes 2 Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

IMPORTANT: Circle the last two letters of your class account:

IMPORTANT: Circle the last two letters of your class account: Spring 2011 University of California, Berkeley College of Engineering Computer Science Division EECS MIDTERM I CS 186 Introduction to Database Systems Prof. Michael J. Franklin NAME: STUDENT ID: IMPORTANT:

More information

Department of Computer Science and Engineering. COSC 4213: Computer Networks II (Fall 2005) Instructor: N. Vlajic Date: November 3, 2005

Department of Computer Science and Engineering. COSC 4213: Computer Networks II (Fall 2005) Instructor: N. Vlajic Date: November 3, 2005 Department of Computer Science and Engineering COSC 4213: Computer Networks II (Fall 2005) Instructor: N. Vlajic Date: November 3, 2005 Midterm Examination Instructions: Examination time: 75 min. Print

More information

1 Document Classification [60 points]

1 Document Classification [60 points] CIS519: Applied Machine Learning Spring 2018 Homework 4 Handed Out: April 3 rd, 2018 Due: April 14 th, 2018, 11:59 PM 1 Document Classification [60 points] In this problem, you will implement several text

More information

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016 CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Admin Course add/drop deadline tomorrow. Assignment 1 is due Friday. Setup your CS undergrad account ASAP to use Handin: https://www.cs.ubc.ca/getacct

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Advanced Search Genetic algorithm

Advanced Search Genetic algorithm Advanced Search Genetic algorithm Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [Based on slides from Jerry Zhu, Andrew Moore http://www.cs.cmu.edu/~awm/tutorials

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Features: representation, normalization, selection. Chapter e-9

Features: representation, normalization, selection. Chapter e-9 Features: representation, normalization, selection Chapter e-9 1 Features Distinguish between instances (e.g. an image that you need to classify), and the features you create for an instance. Features

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

Introduction to AI Spring 2006 Dan Klein Midterm Solutions

Introduction to AI Spring 2006 Dan Klein Midterm Solutions NAME: SID#: Login: Sec: 1 CS 188 Introduction to AI Spring 2006 Dan Klein Midterm Solutions 1. (20 pts.) True/False Each problem is worth 2 points. Incorrect answers are worth 0 points. Skipped questions

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Midterm 1

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Midterm 1 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Midterm 1 PRINT Your Name:, (last) SIGN Your Name: (first) PRINT Your Student ID: CIRCLE your exam room: 1 Pimentel 141 Mccone

More information

CSCI-630 Foundations of Intelligent Systems Fall 2015, Prof. Zanibbi

CSCI-630 Foundations of Intelligent Systems Fall 2015, Prof. Zanibbi CSCI-630 Foundations of Intelligent Systems Fall 2015, Prof. Zanibbi Midterm Examination Name: October 16, 2015. Duration: 50 minutes, Out of 50 points Instructions If you have a question, please remain

More information

Student Number: Lab day:

Student Number: Lab day: CSC 148H1 Summer 2008 Midterm Test Duration 60 minutes Aids allowed: none Last Name: Student Number: Lab day: First Name: Lecture Section: L0101 Instructor: R. Danek Do not turn this page until you have

More information

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And

More information

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

More information

CS-171, Intro to A.I. Mid-term Exam Winter Quarter, 2016

CS-171, Intro to A.I. Mid-term Exam Winter Quarter, 2016 CS-171, Intro to A.I. Mid-term Exam Winter Quarter, 016 YOUR NAME: YOUR ID: ID TO RIGHT: ROW: SEAT: The exam will begin on the next page. Please, do not turn the page until told. When you are told to begin

More information

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1 Problems and were graded by Amin Sorkhei, Problems and 3 by Johannes Verwijnen and Problem by Jyrki Kivinen.. [ points] (a) Gini index and Entropy are impurity measures which can be used in order to measure

More information

Exam Marco Kuhlmann. This exam consists of three parts:

Exam Marco Kuhlmann. This exam consists of three parts: TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding

More information

CS 481/681 Advanced Computer Graphics, Spring 2004 Take-Home Midterm Exam

CS 481/681 Advanced Computer Graphics, Spring 2004 Take-Home Midterm Exam CS 481/681 Advanced Computer Graphics, Spring 2004 Take-Home Midterm Exam Name This exam is to be done individually. It is due, in paper form, STAPLED, at the start of class on Wednesday, March 24, 2003.

More information

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 CS273 Midterm Eam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 Your name: Your UCINetID (e.g., myname@uci.edu): Your seat (row and number): Total time is 80 minutes. READ THE

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Machine Learning for. Artem Lind & Aleskandr Tkachenko Machine Learning for Object Recognition Artem Lind & Aleskandr Tkachenko Outline Problem overview Classification demo Examples of learning algorithms Probabilistic modeling Bayes classifier Maximum margin

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

WRAPPER feature selection method with SIPINA and R (RWeka package). Comparison with a FILTER approach implemented into TANAGRA.

WRAPPER feature selection method with SIPINA and R (RWeka package). Comparison with a FILTER approach implemented into TANAGRA. 1 Topic WRAPPER feature selection method with SIPINA and R (RWeka package). Comparison with a FILTER approach implemented into TANAGRA. Feature selection. The feature selection 1 is a crucial aspect of

More information

Bayes Net Learning. EECS 474 Fall 2016

Bayes Net Learning. EECS 474 Fall 2016 Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models

More information

CPSC 311: Analysis of Algorithms (Honors) Exam 1 October 11, 2002

CPSC 311: Analysis of Algorithms (Honors) Exam 1 October 11, 2002 CPSC 311: Analysis of Algorithms (Honors) Exam 1 October 11, 2002 Name: Instructions: 1. This is a closed book exam. Do not use any notes or books, other than your 8.5-by-11 inch review sheet. Do not confer

More information

CS303 LOGIC DESIGN FINAL EXAM

CS303 LOGIC DESIGN FINAL EXAM JANUARY 2017. CS303 LOGIC DESIGN FINAL EXAM STUDENT NAME & ID: DATE: Instructions: Examination time: 100 min. Write your name and student number in the space provided above. This examination is closed

More information

6.034 QUIZ 1. Fall 2002

6.034 QUIZ 1. Fall 2002 6.034 QUIZ Fall 2002 Name E-mail Problem Number Maximum Score Problem 30 Problem 2 30 Problem 3 0 Problem 4 30 Total 00 Problem : Rule-Based Book Recommendations (30 points) Part A: Forward chaining (5

More information

CSCI544, Fall 2016: Assignment 1

CSCI544, Fall 2016: Assignment 1 CSCI544, Fall 2016: Assignment 1 Due Date: September 23 rd, 4pm. Introduction The goal of this assignment is to get some experience implementing the simple but effective machine learning technique, Naïve

More information

(Due to rounding, values below may be only approximate estimates.) We will supply these numbers as they become available.

(Due to rounding, values below may be only approximate estimates.) We will supply these numbers as they become available. Below, for each problem on this Midterm Exam, Perfect is the percentage of students who received full credit, Partial is the percentage who received partial credit, and Zero is the percentage who received

More information

Exam 2. Name: UVa ID:

Exam 2. Name: UVa  ID: University of Virginia Out: 28 November 2011 cs1120: Introduction of Computing Due: 11:01 am, Wednesday, 30 November Explorations in Language, Logic, and Machines Exam 2 Name: UVa Email ID: Directions

More information

Bayesian Classification Using Probabilistic Graphical Models

Bayesian Classification Using Probabilistic Graphical Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University

More information

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction A. Shanthini 1,

More information

Final Examination. Winter Problem Points Score. Total 180

Final Examination. Winter Problem Points Score. Total 180 CS243 Winter 2002-2003 You have 3 hours to work on this exam. The examination has 180 points. Please budget your time accordingly. Write your answers in the space provided on the exam. If you use additional

More information

CS 540-1: Introduction to Artificial Intelligence

CS 540-1: Introduction to Artificial Intelligence CS 540-1: Introduction to Artificial Intelligence Exam 1: 7:15-9:15pm, October 11, 1995 CLOSED BOOK (one page of notes allowed) Write your answers on these pages and show your work. If you feel that a

More information

1 Training/Validation/Testing

1 Training/Validation/Testing CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.

More information

6.00 Introduction to Computer Science and Programming Fall 2008

6.00 Introduction to Computer Science and Programming Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.00 Introduction to Computer Science and Programming Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

CS 151 Midterm. Instructions: Student ID. (Last Name) (First Name) Signature

CS 151 Midterm. Instructions: Student ID. (Last Name) (First Name) Signature CS 151 Midterm Name Student ID Signature :, (Last Name) (First Name) : : Instructions: 1. Please verify that your paper contains 11 pages including this cover. 2. Write down your Student-Id on the top

More information

MATH 1075 Final Exam

MATH 1075 Final Exam Autumn 2018 Form C Name: Signature: OSU name.#: Lecturer: Recitation Instructor: Recitation Time: MATH 1075 Final Exam Instructions: You will have 1 hour and 45 minutes to take the exam. Show ALL work

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

Good Cell, Bad Cell: Classification of Segmented Images for Suitable Quantification and Analysis

Good Cell, Bad Cell: Classification of Segmented Images for Suitable Quantification and Analysis Cell, Cell: Classification of Segmented Images for Suitable Quantification and Analysis Derek Macklin, Haisam Islam, Jonathan Lu December 4, 22 Abstract While open-source tools exist to automatically segment

More information

BIG DATA SCIENTIST Certification. Big Data Scientist

BIG DATA SCIENTIST Certification. Big Data Scientist BIG DATA SCIENTIST Certification Big Data Scientist Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big Data. To obtain a certification,

More information

Spring 2007 Midterm Exam

Spring 2007 Midterm Exam 15-381 Spring 2007 Midterm Exam Spring 2007 March 8 Name: Andrew ID: This is an open-book, open-notes examination. You have 80 minutes to complete this examination. Unless explicitly requested, we do not

More information

Without fully opening the exam, check that you have pages 1 through 11.

Without fully opening the exam, check that you have pages 1 through 11. Name: Section: Recitation Instructor: INSTRUCTIONS Fill in your name, etc. on this first page. Without fully opening the exam, check that you have pages 1 through 11. Show all your work on the standard

More information

6.034 QUIZ 1 Solutons. Fall Problem 1: Rule-Based Book Recommendations (30 points)

6.034 QUIZ 1 Solutons. Fall Problem 1: Rule-Based Book Recommendations (30 points) 6.034 QUIZ 1 Solutons Fall 2002 Problem 1: Rule-Based Book Recommendations (30 points) Part A: Forward chaining (15 points) You want to recommend books to two of your friends, so you decide to use your

More information

CSE 143, Winter 2010 Midterm Exam Wednesday February 17, 2010

CSE 143, Winter 2010 Midterm Exam Wednesday February 17, 2010 CSE 143, Winter 2010 Midterm Exam Wednesday February 17, 2010 Personal Information: Name: Section: Student ID #: TA: You have 50 minutes to complete this exam. You may receive a deduction if you keep working

More information

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours. CS 188 Spring 2010 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators

More information

CSE 131 Introduction to Computer Science Fall Exam I

CSE 131 Introduction to Computer Science Fall Exam I CSE 131 Introduction to Computer Science Fall 2015 Given: 24 September 2015 Exam I Due: End of session This exam is closed-book, closed-notes, no electronic devices allowed. The exception is the sage page

More information

Question: Total Points: Score:

Question: Total Points: Score: CS 170 Exam 1 Section 000 Spring 2015 Name (print): Instructions: Keep your eyes on your own paper and do your best to prevent anyone else from seeing your work. Do NOT communicate with anyone other than

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016 CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:

More information

Department of Computer Science Faculty of Engineering, Built Environment & IT University of Pretoria. COS122: Operating Systems

Department of Computer Science Faculty of Engineering, Built Environment & IT University of Pretoria. COS122: Operating Systems Department of Computer Science Faculty of Engineering, Built Environment & IT University of Pretoria COS122: Operating Systems Exam Opportunity 1 25 August 2018 Initials and Surname: Student Number: Degree:

More information

EECS 3214 Midterm Test Winter 2017 March 2, 2017 Instructor: S. Datta. 3. You have 120 minutes to complete the exam. Use your time judiciously.

EECS 3214 Midterm Test Winter 2017 March 2, 2017 Instructor: S. Datta. 3. You have 120 minutes to complete the exam. Use your time judiciously. EECS 3214 Midterm Test Winter 2017 March 2, 2017 Instructor: S. Datta Name (LAST, FIRST): Student number: Instructions: 1. If you have not done so, put away all books, papers, and electronic communication

More information

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

UNIVERSITY REGULATIONS

UNIVERSITY REGULATIONS CPSC 221: Algorithms and Data Structures Midterm Exam, 2013 February 15 Name: Student ID: Signature: Section (circle one): MWF(201) TTh(202) You have 60 minutes to solve the 5 problems on this exam. A

More information

COS 126 General Computer Science Fall Exam 1

COS 126 General Computer Science Fall Exam 1 COS 126 General Computer Science Fall 2007 Exam 1 This test has 10 questions worth a total of 50 points. You have 120 minutes. The exam is closed book, except that you are allowed to use a one page cheatsheet,

More information

Unsupervised Learning

Unsupervised Learning Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

University of Florida EEL 4744 Summer 2014 Dr. Eric M. Schwartz Department of Electrical & Computer Engineering 1 July Oct-14 6:41 PM

University of Florida EEL 4744 Summer 2014 Dr. Eric M. Schwartz Department of Electrical & Computer Engineering 1 July Oct-14 6:41 PM Page 1/14 Exam 1 Instructions: First Name Turn off cell phones beepers and other noise making devices. Show all work on the front of the test papers. If you need more room make a clearly indicated note

More information

CSE 143: Computer Programming II Spring 2015 Midterm Exam Solutions

CSE 143: Computer Programming II Spring 2015 Midterm Exam Solutions CSE 143: Computer Programming II Spring 2015 Midterm Exam Solutions Name: Sample Solutions ID #: 1234567 TA: The Best Section: A9 INSTRUCTIONS: You have 50 minutes to complete the exam. You will receive

More information

Modelling Structures in Data Mining Techniques

Modelling Structures in Data Mining Techniques Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor

More information