Perform the following steps to set up for this project. Start out in your login directory on csit (a.k.a. acad).
|
|
- Piers Todd
- 5 years ago
- Views:
Transcription
1 CSC 458 Data Mining and Predictive Analytics I, Fall 2017 (November 22, 2017) Dr. Dale E. Parson, Assignment 4, Comparing Weka Bayesian, clustering, ZeroR, OneR, and J48 models to predict nominal dissolved oxygen levels in an extension of Assignments 2 and 3. Due by 11:59 PM on Friday December 8 via make turnitin. I will not accept late solutions after the end of Sunday December 10 because I need to post my solution to help with your exam preparation; assignments coming in after December 10 earn 0%. If you are not accustomed to using the Linux acad system, see me during office hours, or an in-class lab session, or consult a graduate assistant in Old Main 257. I will not accept student work via D2L for this assignment. You can do all of your work on your own machine or on the campus PCs, obtaining the starting files via S:\ComputerScience\Parson\Weka on November 27. You can also log into acad and perform the following steps to retrieve the same files. You can use the FileZilla client utility or a similar file transfer program to copy files from acad and to place your solution files back onto acad. Assignment 3 s handout shows how to install and use FileZilla with acad. There will be at least one in-class work session for this assignment, and unless you are registered for the 100% on-line sections, I expect you to attend with questions, either in the room, or at class time via Ultra. 100% on-line students are encouraged to attend in Old Main 158 or nearby labs at class time if schedules permit. Perform the following steps to set up for this project. Start out in your login directory on csit (a.k.a. acad). cd $HOME mkdir DataMine # This should already be there from assignment 2. cp ~parson/datamine/bayes458fall2017.problem.zip DataMine/bayes458fall2017.problem.zip cd./datamine unzip bayes458fall2017.problem.zip cd./bayes458fall2017 This is the directory from which you must run make turnitin by the project deadline to avoid a 10% per day late penalty. If you run out of file space in your account, you can perform the following steps from within your DataMine/ directory. Be extremely careful, and do NOT use any file name wildcards. This will discard your results from previous assignments. If you wish to keep those, do not remove directories prepdata1, ruletree458fall2017 or linear458fall2017. rm -rf prepdata1.problem.zip prepdata1.solution.zip prepdata1 rm -rf ruletree458fall2017.problem.zip ruletree458fall2017.solution.zip ruletree458fall2017 rm -rf linear458fall2017.problem.zip linear458fall2017.solution.zip linear458fall2017 You will see the following files in this bayes458fall2017 directory: readme.txt Your answers to Q1 through Q20 below go here, in the required format. csc458fall2017assn4trainingset49k.arff The ARFF file created by assignment 3. makefile Files needed to make turnitin to get your solution to me. checkfiles.sh makelib page 1
2 How can you avoid running out of memory in Weka? 1. Run Weka using a command line or batch script that sets memory size. I run it this way on my Mac: java -server -Xmx4000M -jar /Applications/weka-3-8-0/weka.jar That requires having the Java runtime environment (not necessarily the Java compiler) installed on your machine (true of campus PCs), and locating the path to the weka.jar Java archive that contains the Weka class libraries and other resources. This line allocates 4,000,000 bytes of storage for Weka. As for assignment 2, I have created batch file S:\ComputerScience\WEKA\WekaWith2GBcampus.bat for campus PCs, with handout data files in S:\ComputerScience\Parson\Weka\. I plan to create a 4Gb. Byte script S:\ComputerScience\WEKA\WekaWith4GBcampus.bat after I return to campus on November 8. Try using that. It will contain this command line: java Xmx4096M -jar "S:\ComputerScience\WEKA\weka.jar" 2. Right-click results buffers in the Weka -> Classify window, or use Alt-click on Mac (control-click on PC) to Delete result buffer after you are done with one. They take up space. You can also save these results to text files via this menu. 3. Some of these models take a long time to execute. I have noted that condition in these instructions. In such cases, it may save time just to exit Weka and restart it via the command line or a batch file with a large memory limit, rather than just deleting result buffers. PART I: Preparing your ARFF file. (30% of project grade.) Answer questions at steps 4 & Open csc458fall2017assn4trainingset49k.arff in Weka s Preprocess tab. 2. Remove TimeOfYear because it is redundant with MinuteOfYear and MinuteFromNewYear. We are leaving month in the attribute set for now. (Note: Some machine learning algorithms such as J48 and other decision trees may perform better using partially redundant attributes. A lowresolution attribute such as TimeOfYear may contribute to a more general tree that is less prone to over-fitting than a high-resolution attribute such as MinuteFromNewYear; also, a redundant attribute may help to fine tune a complex tree. However, the NaiveBayes statistical technique assumes statistical independence of non-class attributes, and may be more accurate after removing redundant attributes.) We are keeping MinuteFromNewYear because we can always coarsen its resolution later via discretization. Once an attribute such as MinuteFromNewYear is in page 2
3 low-resolution form such as the 4-valued TimeOfYear, it is impossible to get the high resolution of MinuteFromNewYear back.) 3. Remove TimeOfDay because it is redundant with MinuteOfDay and MinuteFromMidnite. Reasoning is similar to that in step Remove MinuteOfYear because it is redundant with MinuteFromNewYear, and it correlates nonlinearly with a remaining numeric attribute that is not derived from the datetime of the water sample, while MinuteFromNewYear correlates linearly with that same attribute that is not derived from datetime. You can use Weka s Visualize tab to decide which numeric attribute that is not derived from datetime correlates linearly with MinuteFromNewYear (but not linearly with MinuteOfYear), or you can use your knowledge gained from assignments 2 and 3. What is this numeric attribute that is not derived from datetime attribute? (5 of the 30% for this question) 5. Remove MinuteOfDay because it is redundant with MinuteFromMidnite. We are keeping MinuteFromMidnite because it correlates positively with an underlying mechanism for increasing dissolved oxygen found in the assignment 2 readings. What is this underlying mechanism? (5 of the 30% for this question) 6. Create a new derived attribute HourFromMidnite by using the Weka unsupervised -> attribute filter AddExpression that divides MinuteFromMidnite by the number of minutes in an hour. Look at the statistics and graph in the right side of the Weka Preprocess tab to ensure that these attributes have the same distribution. After verifying that HourFromMidnite is an accurate representation of MinuteFromMidnite in terms of hours, remove MinuteFromMidnite. We are doing this because HourFromMidnite is easier to think about. There are only 12 possible hours from the closest midnight (before or after the sample datetime), in contrast to 720 minutes. HourFromMidnite preserves the fine-grain resolution of MinuteFromMidnite in its fractional part. 7. Create a new derived attribute DayFromNewYear by using the Weka unsupervised -> attribute filter AddExpression that divides MinuteFromNewYear by the number of minutes in a day. Look at the statistics and graph in the right side of the Weka Preprocess tab to ensure that these attributes have the same distribution. After verifying that DayFromNewYear is an accurate representation of MinuteFromNewYear, remove MinuteFromNewYear. We are doing this because DayFromNewYear is easier to think about. There are only 183 possible days from midnight on the closest January 1 (before or after the sample datetime), in contrast to 263,520 minutes. DayFromNewYear preserves the fine-grain resolution of MinuteFromNewYear in its fractional part. 8. Discretize OxygenMgPerLiter into 10 discrete bins as in assignment 2. Bayesian analysis requires a nominal target attribute (a.k.a. class). Keep useequalfrequency as False. Do NOT discretize any other numeric attributes at this time. 9. Reorder the attributes to put OxygenMgPerLiter in the last (target) position, without disturbing the relative order of the other attributes. At the end of this step you MUST have these attributes in this order. page 3
4 10. Randomize the order of instances using your unique seed value as in Assignments 2 & 3. Save this as ARFF file csc458fall2017assn4nominaltrainingset49k.arff. It is the name of the input ARFF file with the word Nominal inserted. You must put this into your bayes458fall2017/ project directory before you run make turnitin. Work with csc458fall2017assn4nominaltrainingset49k.arff throughout the remainder of this assignment. We are using 10-fold cross validation with these 49K instances as the training & test dataset in this assignment. Each of Q1 through Q10 is worth 7% of the total project grade. Q1: On this initial set of attributes in this 49K set of measurements, run the following classifiers in the order shown below, and record only these results in your answer. See this footnote for the Kappa statistic 1. ZeroR: Relative absolute error % Root relative squared error % 1 From The (or value) is a metric that compares an Observed Accuracy with an Expected Accuracy (random chance). The kappa statistic is used not only to evaluate a single classifier, but also to evaluate classifiers amongst themselves. In addition, it takes into account random chance (agreement with a random classifier), which generally means it is less misleading than simply using accuracy as a metric (an Observed Accuracy of 80% is a lot less impressive with an Expected Accuracy of 75% versus an Expected Accuracy of 50%). Kappa = (observed accuracy - expected accuracy)/(1 - expected accuracy) Not only can this kappa statistic shed light into how the classifier itself performed, the kappa statistic for one model is directly comparable to the kappa statistic for any other model used for the same classification task. Parson s example: If you had a 6-sided die that had the value 1 on 5 sides, and 0 on the other, the random-chance expected accuracy of rolling a 1 would be 5/6 = 83.3%. Since the ZeroR classifier simply picks the most statistically likely class without respect to the other (non-target) attributes, it would pick an expected die value of 1 in this case, giving a random observed accuracy of 83.3%, and a Kappa of ( ) / ( ) = 0. Also from this linked site: Landis and Koch considers as slight, as fair, as moderate, as substantial, and as almost perfect. Fleiss considers kappas > 0.75 as excellent, as fair to good, and < 0.40 as poor. It is important to note that both scales are somewhat arbitrary. At least two further considerations should be taken into account when interpreting the kappa statistic. First, the kappa statistic should always be compared with an accompanied confusion matrix if possible to obtain the most accurate interpretation. Second, acceptable kappa statistic values vary on the context. For instance, in many inter-rater reliability studies with easily observable behaviors, kappa statistic values below 0.70 might be considered low. However, in studies using machine learning to explore unobservable phenomena like cognitive states such as day dreaming, kappa statistic values above 0.40 might be considered exceptional. page 4
5 OneR: Relative absolute error % Root relative squared error % J48: Relative absolute error % Root relative squared error % NaiveBayes: Relative absolute error % Root relative squared error % BayesNet: Relative absolute error % Root relative squared error % Examine the conditional probability table in the output of NaiveBayes and the graph of BayesNet. You can see the latter, partially illustrated on the next page, by Alt-clicking BayesNet in the Classify tab s result list and selecting Visualize graph. Clicking a node in the graph shows its conditional probabilities. BayesNet is sometimes more accurate than NaiveBayes because NaiveBayes assumes statistical independence of the non-class attributes, while BayesNet does not. BayesNet attempts to model statistical interdependence among these attributes. In the BayesNet illustration below, clicking OxygenMgPerLiter reveals the probability distribution of its 10 discretized bins. Clicking other nodes that are successors (downstream) in the directed acyclic graph reveal more complicated tables. In the illustrated table for TempCelsius below, BayesNet auto-discretizes TempCelsius, and then gives conditional probabilities for OxygenMgPerLiter s bins, given discrete bins for TempCelsius. Note how the probability for the low-level ( ] bin of OxygenMgPerLiter changes going left-to-right from lower-to-higher TempCelsius, and the probability for the high-level ( ] bin of OxygenMgPerLiter also changes with increases in TempCelsius. BayesNet takes all of probabilities in all graph nodes for a given bin of OxygenMgPerLiter, multiplies them together, normalizes the result in the range 0%-100%, and uses this number to predict the probability of that bin of the class (target attribute), given all other attribute value bins. While the graph below auto-generates from OxygenMgPerLiter as the class, it is possible to use expertise to hand-design a graph. Again, the main benefit of BayesNet over NaiveBayes in some cases is BayesNet s non-assumption of conditional independence among the non-class attributes. page 5
6 Q2: From NaiveBayes, copy & paste the mean row for HourFromMidnite as it correlates with OxygenMgPerLiter in the 10 columns. Attribute '(range]' '(range]' '(range]' '(range]' '(range]' '(range]' '(range]' '(range]' '(range]' '(range)' HourFromMidnite mean What change-in-value pattern does class attribute OxygenMgPerLiter exhibit as it goes left-to-right across increasing distances in HourFromMidnite, particularly for late morning through afternoon? From the analyses of assignments 2 and 3, what is the underlying physical or chemical cause of this pattern? Q3: From the BayesNet graph node for month, what probability-of-occurrence pattern does the low-level ( ] bin of OxygenMgPerLiter exhibit as it goes left-to-right across increasing months from 1 (January) through 12 (December)? From the analyses of assignments 2 and 3, what is the underlying physical or chemical cause of this pattern? Alt-click each result except NaiveBayes in the Classify tab s result list and Delete result buffer to recover some storage. Note the value of Correctly Classified Instances for NaiveBayes with this full attribute set. Then, for each of the non-class attributes, starting at ph and working your way, one at a time, down through DayFromNewYear, perform the following steps in a loop: A. Remove the next non-class attribute and run NaiveBayes. B. If Correctly Classified Instances increases or stays the same after this removal, leave that attribute removed; otherwise (Correctly Classified Instances has decreased from its maximum NaiveBayes value so far), execute Undo to restore the attribute. C. Note which attributes you have removed without a subsequent Undo to restore them. D. You can use Delete result buffer to recover some storage. I kept only the NaiveBayes result with the greatest Correctly Classified Instances so far to help me keep track of this maximum. page 6
7 E. Repeat steps A-D, one attribute at a time, until you have removed, tested, and conditionally restored each non-class attribute, one at a time, through DayFromNewYear, which is the last nonclass attribute. Q4: After completing the above steps, which attribute or attributes did you permanently remove? Q5: Which of the permanently removed attribute(s) of Q4, if any, correlate with a remaining attribute, based on the analyses of assignments 2 and 3? With which of the remaining non-class attributes do these removed attribute(s) correlate? Other removed attributes simply do not correlate well with OxygenMgPerLiter, so their removal decreases error in NaiveBayes. The removed attributes of Q5, on the other hand, violate the statistical independence assumption of NaiveBayes, and so their removal reduces error introduced by violating this assumption. Q6: Repeat step Q1 with this reduced attribute set and record the same results here for those same exact classifiers ZeroR, OneR, J48, NaiveBayes, and BayesNet. Q7: In going from the full attribute set of Q1 to the reduced attribute set of Q6, which classifier(s) improved accuracy in terms of Correct Classified Instances? Why did it or they improve? Q8: In going from the full attribute set of Q1 to the reduced attribute set of Q6, which classifier(s) show decreased accuracy in terms of Correct Classified Instances? Why did it or they get worse? Q9: In going from the full attribute set of Q1 to the reduced attribute set of Q6, which classifier(s) show no change in accuracy in terms of Correct Classified Instances? Why did it or they show no change? Q10: Run SimpleKMeans clustering with 3 clusters and complete the table below by using Cut and Paste from the Weka results. Make a pairwise comparison between the Full Data centroids and Clusters 0, 1, and 2, i.e., pair Full Data with each of the others in turn and compare changes from the overall centroids of Full Data. Describe any correlations you see in changes for TempCelsius and OxygenMgPerLiter in going from Full Data to the respective Cluster 0, 1, and 2. Do any of the other nonclass attributes ph, Conductance, or HourFromMidnite show a similarly clear correlation with OxygenMgPerLiter? Final cluster centroids: Cluster# Attribute Full Data ( ) (N) (N) (N) ================================================================================== ph TempCelsius Conductance HourFromMidnite OxygenMgPerLiter page 7
8 Added NOTE 11/26/2017: In some cases a BayesNet will create a graph node that looks like this for an attribute: In that case you should remove the attribute from the set of attributes, since a constant multiplier of 1 contributes nothing to the conditional probability calculation for the attribute being estimated. page 8
DUE By 11:59 PM on Thursday March 15 via make turnitin on acad. The standard 10% per day deduction for late assignments applies.
CSC 558 Data Mining and Predictive Analytics II, Spring 2018 Dr. Dale E. Parson, Assignment 2, Classification of audio data samples from assignment 1 for predicting numeric white-noise amplification level
More informationAI32 Guide to Weka. Andrew Roberts 1st March 2005
AI32 Guide to Weka Andrew Roberts http://www.comp.leeds.ac.uk/andyr 1st March 2005 1 Introduction Weka is an excellent system for learning about machine learning techniques. Of course, it is a generic
More information6.034 Design Assignment 2
6.034 Design Assignment 2 April 5, 2005 Weka Script Due: Friday April 8, in recitation Paper Due: Wednesday April 13, in class Oral reports: Friday April 15, by appointment The goal of this assignment
More informationCSC116: Introduction to Computing - Java
CSC116: Introduction to Computing - Java Course Information Introductions Website Syllabus Computers First Java Program Text Editor Helpful Commands Java Download Intro to CSC116 Instructors Course Instructor:
More informationCSC 510 Advanced Operating Systems, Fall 2017
CSC 510 Advanced Operating Systems, Fall 2017 Dr. Dale E. Parson, Assignment 4, Benchmarking and analyzing a modified Assignment 1 running on System VMs on Type 1 and Type 2 hypervisors. This assignment
More informationThe Explorer. chapter Getting started
chapter 10 The Explorer Weka s main graphical user interface, the Explorer, gives access to all its facilities using menu selection and form filling. It is illustrated in Figure 10.1. There are six different
More informationCOMP s1 - Getting started with the Weka Machine Learning Toolkit
COMP9417 16s1 - Getting started with the Weka Machine Learning Toolkit Last revision: Thu Mar 16 2016 1 Aims This introduction is the starting point for Assignment 1, which requires the use of the Weka
More informationCSC116: Introduction to Computing - Java
CSC116: Introduction to Computing - Java Intro to CSC116 Course Information Introductions Website Syllabus Computers First Java Program Text Editor Helpful Commands Java Download Course Instructor: Instructors
More informationCSC 343 Operating Systems, Fall 2015
CSC 343 Operating Systems, Fall 2015 Dr. Dale E. Parson, Assignment 4, analyzing swapping algorithm variations. This assignment is due via gmake turnitin from the swapping2015 directory by 11:59 PM on
More informationCSC116: Introduction to Computing - Java
CSC116: Introduction to Computing - Java Course Information Introductions Website Syllabus Schedule Computing Environment AFS (Andrew File System) Linux/Unix Commands Helpful Tricks Computers First Java
More informationData Mining. Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka. I. Data sets. I.1. Data sets characteristics and formats
Data Mining Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka I. Data sets I.1. Data sets characteristics and formats The data to be processed can be structured (e.g. data matrix,
More informationHere are the steps to get the files for this project after logging in on acad/bill.
CSC 243, Java Programming, Spring 2013, Dr. Dale Parson Assignment 5, handling events in a working GUI ASSIGNMENT due by 11:59 PM on Thursday May 9 via gmake turnitin Here are the steps to get the files
More informationCSC 220 Object Oriented Multimedia Programming, Fall 2018
CSC 220 Object Oriented Multimedia Programming, Fall 2018 Dr. Dale E. Parson, Assignment 3, text menu on a remote-control Android, mostly array handling. This assignment is due via D2L Assignment Assignment
More informationAttribute Discretization and Selection. Clustering. NIKOLA MILIKIĆ UROŠ KRČADINAC
Attribute Discretization and Selection Clustering NIKOLA MILIKIĆ nikola.milikic@fon.bg.ac.rs UROŠ KRČADINAC uros@krcadinac.com Naive Bayes Features Intended primarily for the work with nominal attributes
More informationDr. Prof. El-Bahlul Emhemed Fgee Supervisor, Computer Department, Libyan Academy, Libya
Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Performance
More informationCSC 543 Multiprocessing & Concurrent Programming, Fall 2016
CSC 543 Multiprocessing & Concurrent Programming, Fall 2016 Dr. Dale E. Parson, Midterm Exam Project, Assorted Thread Synchronization Problems This assignment is due by 11:59 PM on Wednesday November 2
More informationCIS 302 Relational Database Systems
CIS 302 Relational Database Systems Fall 2008 Cedar Crest College Tony Marasco COURSE CONTENT In this course the student is provided with a solid and practical foundation for the design, implementation,
More informationDM204 - Scheduling, Timetabling and Routing
Department of Mathematics and Computer Science University of Southern Denmark, Odense Issued: March 27, 2009 Marco Chiarandini DM204 - Scheduling, Timetabling and Routing, Fall 2009 Problem A 1 Practical
More informationPractical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer
Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before
More informationENGR 3950U / CSCI 3020U (Operating Systems) Simulated UNIX File System Project Instructor: Dr. Kamran Sartipi
ENGR 3950U / CSCI 3020U (Operating Systems) Simulated UNIX File System Project Instructor: Dr. Kamran Sartipi Your project is to implement a simple file system using C language. The final version of your
More informationCSC 310 Programming Languages, Spring 2014, Dr. Dale E. Parson
CSC 310 Programming Languages, Spring 2014, Dr. Dale E. Parson Assignment 3, Perquacky in Python, due 11:59 PM, Saturday April 12, 2014 I will turn the solution back on Monday April 14, after which I will
More informationHere are the steps to get the files for this project after logging in on acad/bill.
CSC 243, Java Programming, Spring 2013, Dr. Dale Parson Assignment 3, cloning & serializing game state for save & restore commands ASSIGNMENT due by 11:59 PM on Thursday April 11 via gmake turnitin Here
More informationHere are the steps to get the files for this project after logging in on acad/bill.
CSC 243, Java Programming, Spring 2014, Dr. Dale Parson Assignment 4, implementing undo, redo & initial GUI layout ASSIGNMENT due by 11:59 PM on Saturday April 19 via gmake turnitin ASSIGNMENT 5 (see page
More informationHands on Datamining & Machine Learning with Weka
Step1: Click the Experimenter button to launch the Weka Experimenter. The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze
More informationData Mining: STATISTICA
Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and
More informationProject 3: An Introduction to File Systems. COP4610 Florida State University
Project 3: An Introduction to File Systems COP4610 Florida State University 1 Introduction The goal of project 3 is to understand basic file system design and implementation file system testing data serialization/de-serialization
More informationCSC 552 UNIX System Programming, Fall 2015
CSC 552 UNIX System Programming, Fall 2015 Dr. Dale E. Parson, Assignment 4, multi-threading a socket-based server loop & helper functions. This assignment is due via make turnitin from the wordcathreadc4/
More informationWEKA homepage.
WEKA homepage http://www.cs.waikato.ac.nz/ml/weka/ Data mining software written in Java (distributed under the GNU Public License). Used for research, education, and applications. Comprehensive set of
More informationWhat is KNIME? workflows nodes standard data mining, data analysis data manipulation
KNIME TUTORIAL What is KNIME? KNIME = Konstanz Information Miner Developed at University of Konstanz in Germany Desktop version available free of charge (Open Source) Modular platform for building and
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationClassifica(on and Clustering with WEKA. Classifica*on and Clustering with WEKA
Classifica(on and Clustering with WEKA 1 Schedule: Classifica(on and Clustering with WEKA 1. Presentation of WEKA. 2. Your turn: perform classification and clustering. 2 WEKA Weka is a collec*on of machine
More informationOutline. Prepare the data Classification and regression Clustering Association rules Graphic user interface
Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1 Prepare the Data Statistica can read from Excel,.txt and many other
More informationHomework Assignment #3
CS 540-2: Introduction to Artificial Intelligence Homework Assignment #3 Assigned: Monday, February 20 Due: Saturday, March 4 Hand-In Instructions This assignment includes written problems and programming
More informationCSC 343 Operating Systems, Fall 2015
CSC 343 Operating Systems, Fall 2015 Dr. Dale E. Parson, Assignment 2, modeling an atomic spin lock, a mutex, and a condition variable. This assignment is due via gmake turnitin from the criticalsection2015
More informationWeek 10 Project 3: An Introduction to File Systems. Classes COP4610 / CGS5765 Florida State University
Week 10 Project 3: An Introduction to File Systems Classes COP4610 / CGS5765 Florida State University 1 Introduction The goal of project 3 is to understand basic file system design and implementation file
More informationUsing Weka for Classification. Preparing a data file
Using Weka for Classification Preparing a data file Prepare a data file in CSV format. It should have the names of the features, which Weka calls attributes, on the first line, with the names separated
More informationImportant Project Dates
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Fall 2002 Handout 4 Project Overview Wednesday, September 4 This is an overview of the course project
More informationNon-trivial extraction of implicit, previously unknown and potentially useful information from data
CS 795/895 Applied Visual Analytics Spring 2013 Data Mining Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs795-s13/ What is Data Mining? Many Definitions Non-trivial extraction of implicit, previously
More informationCSC209. Software Tools and Systems Programming. https://mcs.utm.utoronto.ca/~209
CSC209 Software Tools and Systems Programming https://mcs.utm.utoronto.ca/~209 What is this Course About? Software Tools Using them Building them Systems Programming Quirks of C The file system System
More informationCS 241 Data Organization using C
CS 241 Data Organization using C Fall 2018 Instructor Name: Dr. Marie Vasek Contact: Private message me on the course Piazza page. Office: Farris 2120 Office Hours: Tuesday 2-4pm and Thursday 9:30-11am
More informationOrganisation. Assessment
Week 1 s s Getting Started 1 3 4 5 - - Lecturer Dr Lectures Tuesday 1-13 Fulton House Lecture room Tuesday 15-16 Fulton House Lecture room Thursday 11-1 Fulton House Lecture room Friday 10-11 Glyndwr C
More informationCS 385 Operating Systems Fall 2011 Homework Assignment 4 Simulation of a Memory Paging System
CS 385 Operating Systems Fall 2011 Homework Assignment 4 Simulation of a Memory Paging System Due: Tuesday November 15th. Electronic copy due at 2:30, optional paper copy at the beginning of class. Overall
More informationCS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek
CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Fall, 2015!1 Weka is Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development
More informationCOMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES
COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES USING DIFFERENT DATASETS V. Vaithiyanathan 1, K. Rajeswari 2, Kapil Tajane 3, Rahul Pitale 3 1 Associate Dean Research, CTS Chair Professor, SASTRA University,
More informationCMSC 201 Spring 2018 Lab 01 Hello World
CMSC 201 Spring 2018 Lab 01 Hello World Assignment: Lab 01 Hello World Due Date: Sunday, February 4th by 8:59:59 PM Value: 10 points At UMBC, the GL system is designed to grant students the privileges
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationCertified Tester Foundation Level Performance Testing Sample Exam Questions
International Software Testing Qualifications Board Certified Tester Foundation Level Performance Testing Sample Exam Questions Version 2018 Provided by American Software Testing Qualifications Board and
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationAssignment 2, perquack2 class hierarchy in Java, due 11:59 PM, Sunday March 16, 2014 Login into your account on acad/bill and do the following steps:
CSC 243 Java Programming, Spring 2014, Dr. Dale E. Parson Assignment 2, perquack2 class hierarchy in Java, due 11:59 PM, Sunday March 16, 2014 Login into your account on acad/bill and do the following
More informationIMS database application manual
IMS database application manual The following manual includes standard operation procedures (SOP) for installation and usage of the IMS database application. Chapter 1 8 refer to Windows 7 operating systems
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationProgramming Studio #1 ECE 190
Programming Studio #1 ECE 190 Programming Studio #1 Announcements Recitation Binary representation, hexadecimal notation floating point representation, 2 s complement In Studio Assignment Introduction
More informationAssignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis
Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Due by 11:59:59pm on Tuesday, March 16, 2010 This assignment is based on a similar assignment developed at the University of Washington. Running
More informationBest First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis
Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction
More informationParallel Programming Pre-Assignment. Setting up the Software Environment
Parallel Programming Pre-Assignment Setting up the Software Environment Author: B. Wilkinson Modification date: January 3, 2016 Software The purpose of this pre-assignment is to set up the software environment
More informationIn this project, I examined methods to classify a corpus of s by their content in order to suggest text blocks for semi-automatic replies.
December 13, 2006 IS256: Applied Natural Language Processing Final Project Email classification for semi-automated reply generation HANNES HESSE mail 2056 Emerson Street Berkeley, CA 94703 phone 1 (510)
More informationCPS122 Lecture: From Python to Java
Objectives: CPS122 Lecture: From Python to Java last revised January 7, 2013 1. To introduce the notion of a compiled language 2. To introduce the notions of data type and a statically typed language 3.
More informationCMSC 201 Spring 2017 Lab 01 Hello World
CMSC 201 Spring 2017 Lab 01 Hello World Assignment: Lab 01 Hello World Due Date: Sunday, February 5th by 8:59:59 PM Value: 10 points At UMBC, our General Lab (GL) system is designed to grant students the
More informationData Preparation. UROŠ KRČADINAC URL:
Data Preparation UROŠ KRČADINAC EMAIL: uros@krcadinac.com URL: http://krcadinac.com Normalization Normalization is the process of rescaling the values to a specific value scale (typically 0-1) Standardization
More informationData Mining Laboratory Manual
Data Mining Laboratory Manual Department of Information Technology MLR INSTITUTE OF TECHNOLOGY Marri Laxman Reddy Avenue, Dundigal, Gandimaisamma (M), R.R. Dist. Data Mining Laboratory Manual Prepared
More informationWEKA Explorer User Guide for Version 3-4
WEKA Explorer User Guide for Version 3-4 Richard Kirkby Eibe Frank July 28, 2010 c 2002-2010 University of Waikato This guide is licensed under the GNU General Public License version 2. More information
More informationData Mining. Lesson 9 Support Vector Machines. MSc in Computer Science University of New York Tirana Assoc. Prof. Dr.
Data Mining Lesson 9 Support Vector Machines MSc in Computer Science University of New York Tirana Assoc. Prof. Dr. Marenglen Biba Data Mining: Content Introduction to data mining and machine learning
More informationCOMP 3400 Programming Project : The Web Spider
COMP 3400 Programming Project : The Web Spider Due Date: Worth: Tuesday, 25 April 2017 (see page 4 for phases and intermediate deadlines) 65 points Introduction Web spiders (a.k.a. crawlers, robots, bots,
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationLinux File System and Basic Commands
Linux File System and Basic Commands 0.1 Files, directories, and pwd The GNU/Linux operating system is much different from your typical Microsoft Windows PC, and probably looks different from Apple OS
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationClassification using Weka (Brain, Computation, and Neural Learning)
LOGO Classification using Weka (Brain, Computation, and Neural Learning) Jung-Woo Ha Agenda Classification General Concept Terminology Introduction to Weka Classification practice with Weka Problems: Pima
More informationWorking with Basic Linux. Daniel Balagué
Working with Basic Linux Daniel Balagué How Linux Works? Everything in Linux is either a file or a process. A process is an executing program identified with a PID number. It runs in short or long duration
More informationContact No office hours, but is checked multiple times daily. - Specific questions/issues, particularly conceptual
CS III: Lab Hi Contact - Email : jadamek2@kent.edu - No office hours, but email is checked multiple times daily. - Specific questions/issues, particularly conceptual ones. - Only exception: really odd
More informationLAD-WEKA Tutorial Version 1.0
LAD-WEKA Tutorial Version 1.0 March 25, 2014 Tibérius O. Bonates tb@ufc.br Federal University of Ceará, Brazil Vaux S. D. Gomes vauxgomes@gmail.com Federal University of the Semi-Arid, Brazil 1 Requirements
More informationContents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...
Contents Note: pay attention to where you are........................................... 1 Note: Plaintext version................................................... 1 Hello World of the Bash shell 2 Accessing
More informationShort instructions on using Weka
Short instructions on using Weka G. Marcou 1 Weka is a free open source data mining software, based on a Java data mining library. Free alternatives to Weka exist as for instance R and Orange. The current
More informationThe Data Mining Application Based on WEKA: Geographical Original of Music
Management Science and Engineering Vol. 10, No. 4, 2016, pp. 36-46 DOI:10.3968/8997 ISSN 1913-0341 [Print] ISSN 1913-035X [Online] www.cscanada.net www.cscanada.org The Data Mining Application Based on
More informationCSC209H Lecture 1. Dan Zingaro. January 7, 2015
CSC209H Lecture 1 Dan Zingaro January 7, 2015 Welcome! Welcome to CSC209 Comments or questions during class? Let me know! Topics: shell and Unix, pipes and filters, C programming, processes, system calls,
More informationInstalling and Upgrading Cisco Network Registrar Virtual Appliance
CHAPTER 3 Installing and Upgrading Cisco Network Registrar Virtual Appliance The Cisco Network Registrar virtual appliance includes all the functionality available in a version of Cisco Network Registrar
More informationKNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Data Understanding Exercise: Market Basket Analysis Exercise:
More informationChapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction
CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle
More informationLaboratory 1: Eclipse and Karel the Robot
Math 121: Introduction to Computing Handout #2 Laboratory 1: Eclipse and Karel the Robot Your first laboratory task is to use the Eclipse IDE framework ( integrated development environment, and the d also
More informationComputing a Gain Chart. Comparing the computation time of data mining tools on a large dataset under Linux.
1 Introduction Computing a Gain Chart. Comparing the computation time of data mining tools on a large dataset under Linux. The gain chart is an alternative to confusion matrix for the evaluation of a classifier.
More informationBEMIDJI STATE UNIVERSITY COLLEGE OF BUSINESS, TECHNOLOGY AND COMMUNICATION Course syllabus Fall 2011
BEMIDJI STATE UNIVERSITY COLLEGE OF BUSINESS, TECHNOLOGY AND COMMUNICATION Course syllabus Fall 2011 COURSE: Computer Business Application - (BUAD 2280-01) COURSE CREDIT: INSTRUCTOR: 3.0 Credit Hours Mehdi
More informationCSC209. Software Tools and Systems Programming. https://mcs.utm.utoronto.ca/~209
CSC209 Software Tools and Systems Programming https://mcs.utm.utoronto.ca/~209 What is this Course About? Software Tools Using them Building them Systems Programming Quirks of C The file system System
More informationSoftware Testing. 1. Testing is the process of demonstrating that errors are not present.
What is Testing? Software Testing Many people understand many definitions of testing :. Testing is the process of demonstrating that errors are not present.. The purpose of testing is to show that a program
More informationDecision Trees Using Weka and Rattle
9/28/2017 MIST.6060 Business Intelligence and Data Mining 1 Data Mining Software Decision Trees Using Weka and Rattle We will mainly use Weka ((http://www.cs.waikato.ac.nz/ml/weka/), an open source datamining
More informationSupervised and Unsupervised Learning (II)
Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised
More informationPROJECT 1 DATA ANALYSIS (KR-VS-KP)
PROJECT 1 DATA ANALYSIS (KR-VS-KP) Author: Tomáš Píhrt (xpiht00@vse.cz) Date: 12. 12. 2015 Contents 1 Introduction... 1 1.1 Data description... 1 1.2 Attributes... 2 1.3 Data pre-processing & preparation...
More informationText classification with Naïve Bayes. Lab 3
Text classification with Naïve Bayes Lab 3 1 The Task Building a model for movies reviews in English for classifying it into positive or negative. Test classifier on new reviews Takes time 2 Sentiment
More informationPractical Data Mining COMP-321B. Tutorial 5: Article Identification
Practical Data Mining COMP-321B Tutorial 5: Article Identification Shevaun Ryan Mark Hall August 15, 2006 c 2006 University of Waikato 1 Introduction This tutorial will focus on text mining, using text
More informationTools for Annotating and Searching Corpora Practical Session 1: Annotating
Tools for Annotating and Searching Corpora Practical Session 1: Annotating Stefanie Dipper Institute of Linguistics Ruhr-University Bochum Corpus Linguistics Fest (CLiF) June 6-10, 2016 Indiana University,
More informationClearing Out Legacy Electronic Records
For whom is this guidance intended? Clearing Out Legacy Electronic Records This guidance is intended for any member of University staff who has a sizeable collection of old electronic records, such as
More informationBy Ludovic Duvaux (27 November 2013)
Array of jobs using SGE - an example using stampy, a mapping software. Running java applications on the cluster - merge sam files using the Picard tools By Ludovic Duvaux (27 November 2013) The idea ==========
More informationProgramming Assignments
ELEC 486/586, Summer 2017 1 Programming Assignments 1 General Information 1.1 Software Requirements Detailed specifications are typically provided for the software to be developed for each assignment problem.
More informationHomework # 4. Example: Age in years. Answer: Discrete, quantitative, ratio. a) Year that an event happened, e.g., 1917, 1950, 2000.
Homework # 4 1. Attribute Types Classify the following attributes as binary, discrete, or continuous. Further classify the attributes as qualitative (nominal or ordinal) or quantitative (interval or ratio).
More informationEMC ViPR SRM. Data Enrichment and Chargeback Guide. Version
EMC ViPR SRM Version 4.0.2.0 Data Enrichment and Chargeback Guide 302-003-448 01 Copyright 2016-2017 Dell Inc. or its subsidiaries All rights reserved. Published January 2017 Dell believes the information
More informationMidterm Examination CS 540-2: Introduction to Artificial Intelligence
Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State
More informationUser Guide Written By Yasser EL-Manzalawy
User Guide Written By Yasser EL-Manzalawy 1 Copyright Gennotate development team Introduction As large amounts of genome sequence data are becoming available nowadays, the development of reliable and efficient
More informationDATA MINING LAB MANUAL
DATA MINING LAB MANUAL Subtasks : 1. List all the categorical (or nominal) attributes and the real-valued attributes seperately. Attributes:- 1. checking_status 2. duration 3. credit history 4. purpose
More informationEXAM PREPARATION GUIDE
When Recognition Matters EXAM PREPARATION GUIDE PECB Certified ISO 31000 Risk Manager www.pecb.com The objective of the PECB Certified ISO 31000 Risk Manager examination is to ensure that the candidate
More informationRoll Marking Secondary School Tech Tip
Roll Marking Secondary School Tech Tip Index Roll Marking Secondary School... 1 Roll Marking Periods... 2 Holiday and Term Dates... 3 Check the Attendance Settings... 4 Marking the Roll... 6 Unmarked Rolls
More informationCPSC 150 Laboratory Manual. Lab 1 Introduction to Program Creation
CPSC 150 Laboratory Manual A Practical Approach to Java, jedit & WebCAT Department of Physics, Computer Science & Engineering Christopher Newport University Lab 1 Introduction to Program Creation Welcome
More informationA Comparative Study of Selected Classification Algorithms of Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220
More information