CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED KNEAREST NEIGHBOR (MKNN) ALGORITHM


 Godwin Sparks
 1 years ago
 Views:
Transcription
1 CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED KNEAREST NEIGHBOR (MKNN) ALGORITHM 4.1 Introduction Nowadays money investment in stock market gains major attention because of its dynamic nature. So the significant issue in market finance is discovering well organized approaches to outline and envision the stock market information to provide individuals or organizations helpful data about the behavior of the market for making decision about investment.the huge amount of important information produced by the stock market has attracted researchers to investigate this issue utilizing distinctive approaches. Since stock markets produce huge datasets it data mining techniques is found to be more efficient.data mining is utilized for excavate data from databases and discover the meaningful patterns from the database. The usefulness of this data makes data mining imperative and necessary.the essentials of data mining in finance are originating from the need to adopt specific well organized criteria to predict exactness, facilitate multiresolution calculation. 4.2 k Nearest Neighbor (knn) In pattern identification, the KNN is a technique for categorizing items according nearest training samples. KNN is a sort of illustration based learning, or lazy learning where the task is just approximated locally and all calculation is delayed until classification Assumptions in KNN KNN assumes that the information is present in a feature space. Accurately, the data points are in a metric space. Mostly these data are either multidimensional or scalar vectors. Since the points are in feature space, they have a concept of distance. This requirement is not need to be Euclidean distance yet it is used commonly. Every training sample comprises of a vectors set and separate class label corresponding with each vector. These classes may be either positive or negative classes. But KNN have the efficiency to accomplish different tasks with random number of classes.
2 Additionally a single number k is given. This number makes a decision of what numbers of neighbors (where neighbors are defined based on the distance metric) impact the classification. This is typically an odd number if the quantity of classes is 2. In the event that k=1, then the is just called the nearest neighbor Basics of KNN The KNN is the principal and most straightforward classification technique when the information about the distribution of the data is insufficient. This convention basically holds the whole training set during learning and allocates to every query a class characterized by the majority label of its knearest neighbors in the training set. The Nearest Neighbor (NN) principle is the least complex type of KNN when K = 1. In this every training samples ought to be grouped to its samples surrounded by it. Subsequently, if the classification of any of the sample data is obscure, then it could be anticipated by considering the classification of its nearest neighbor tests. Given an obscure sample and a training set consisting of samples, all the distances between the obscure sample and the entire sample in the training set can be calculated by utilizing the accompanying mathematical statement (4.1) where, x 1, x 2, x 3,x p are anticipators of the first sample and u 1, u 2,u 3, u p are anticipators of the second sample. If distance is of smallest value, then the samples in the training set is close to the obscure sample. Hence, the obscure sample may be categorized based on this nearest neighbor classification.
3 Known Samples Unknown Samples (a) (b) Fig 4.1 KNN decision rule Fig 4.1 illustrates the KNN decision rule for K= 1 and K= 3 for a set of samples divided into 2 classes.in Fig 4.1(a), an obscure sample (unknown sample) is categorized by using only one known sample; In Fig 4.1(b) more than one known sample is used. In the last case, the parameter K is set to 3, hence the closest three samples is considered for classifying the obscure one. Two of them belong to the same class, whereas only one belongs to the other class. In both cases, the unknown sample is classified as belonging to the class on the left. Fig 4.2 shows the pseudo code for the KNN
4 Input: Finite set A, Finite Set B, k, function c:b>{1,2,.n} Output: r:a>{1,2,..n} Begin For each x in A do Let L< {} For each b in B add (a(x,b), c(b)) to L Sort the elements in L with the first components Compute the class labels from the first k elements from L Let r(x) be the class containing highest number of occurrences End Return r End Fig 4.2 Pseudo code for KNN The classifier performance is principally controlled by the decision of K and in addition the distance metric applied [2025]. This evaluation is influenced by the sensitivity of the choosing the neighborhood size K, since local region radius is calculated by the K th nearest neighbor distance to the query and diverse value of K yields various conditional class probabilities Distance Metric KNN makes estimation according to the result of the K neighbors closest to that point. Accordingly, to make estimation with KNN, we have to characterize a metric for measuring the separation between the query point and cases from the samples. A familiar opinion to estimate this distance is known as Euclidean. Different measures include Euclidean square, Citysquare, and Chebychev. Table 4.1 presents the distance metrics and their formula.
5 Table 4.1 Distance metrics employed in KNN Distance Metric Formula (x query point, p data point from unknown sample) Euclidean Distance Euclidean Squared Cityblock Chebychev KNearest Neighbor Predictions After choosing the value of K, anticipations are made based on the KNN samples. For regression, KNN prediction is the result of average of the K nearest neighbors: (4.2) Where x i is the i th case of the sample and y is the query point anticipation (result).in classification problem, based on the voting scheme KNN anticipation is performed in which the winner is used to name the query. Generally the K neighbors have equivalent impact on prediction regardless of their relative distance from the query point. An optional methodology is to use randomly large K values with more vitality given to cases nearest to the query point. This is accomplished by using 'distance weighting' Distance Weighting Since KNN forecasts are based on the belief that items close in distance are conceivably similar, it is good to differentiate between the K nearest neighbors during prediction, i.e., let the closest points among the K nearest neighbors have more say in influencing the result of the query point. This can be attained by presenting a set of weights W, one for every nearest neighbor, characterized by the relative closeness of each one neighbor regarding the query point. Thus
6 (4.3) Where is the distance between the query point x and the i th case p i of the sample. It is clear that the weights defined in this manner above will satisfy: (4.4) Thus, for regression problems, we have: (4.5) For classification problems, the highest value of the above equation is taken for every one of class variables. It is obvious from the above equation that when K>1, one can basically characterize the standard deviation for predictions in regression tasks using, (4.6) Some of the KNN merits are depicted as follows: Easy to use; resilient to noisy training samples, particularly if the inverse square of weighted distance is used as the "distance" measure; and Effective if the training data is vast. In spite of these advantages, it has a few demerits such as: a) computationally expensive as it needs to find distance of each one query example to all training sample data; b) The huge memory to execute in extent with size of training set; c) Low precision rate in multidimensional datasets; d) Need to find the parameter value K, the quantity of nearest neighbors; e) Distance based learning is not clear which sort of distance to use; and f) decide which labels are ideal to produce the best results. Therefore, to overcome the low precision rate of KNN, Modified KNN (MKNN) has been proposed in this research work. The MKNN preprocesses the training set before using it and finds the legitimacy of any training data.the final classification is then made by applying weighted KNN which used validity as the multiplicative factor. 4.3 Modified KNearest Neighbor (MKNN) In this research Modified KNearest Neighbor Algorithm is used for prediction of stock index movement.the fundamental idea of the presented technique is allocating the class label of the
7 queried instance into K validated data training points and the validity of all data tests in the training set is calculated. At that point, a weighted KNN is performed on any trained samples. Fig 4.3 demonstrates the pseudo code of the MKNN. Pseudocode of the MKNN Algorithm End Output_label:= MKNN ( train_set, test_sample) Begin For i := 1 to train_size Validity(i) := Compute Validity of ith sample; End for; Output_label:=Weighted_KNN(Validity,test_sample); Return Output_label; Fig 4.3 Pseudo code of the MKNN Data and Sources of Data This exploration inspects the monthly change of closing values of NSENIFTY and BSE stock data according to the following predictors: Open price, High price, Low price and Close price. NSENIFTY and BSE stock index values are acquired from the NSE and BSE sites separately for the period from Jan'2013 to Dec The data is split into two subtests of 80:20 where the intest sample or preparing data compasses from Jan' 2010 to Dec' 2012 and the data for the remaining period from Jan 2013 to Dec 2013are used for outof sample or test data Preprocessing the data When the data was gathered at first, all the values of the attributes chosen were continuous numeric values. Data conversion was applied by generalizing the data to a higherlevel concept so as all the values got to be discrete. The rule that was made to convert the numeric values of each one attribute to discrete values relied on upon the earlier day closing price of the stock. If in case that the values of the properties open, high, low, and close were more prominent than the estimation of attribute past for the same trading day, the numeric values of the attribute were supplanted by the value positive. In the event that the values of the attributes said above were
8 short of what the value of the attributes used previously, the numeric values of the attributes were supplanted by negative. If the values of those attributes were equal to the value of the attribute previous,then values were replaced by the same equal value Building the Model After the data has been arranged and converted, the upcoming step was to build the forecast model using the MKNN. The MKNN was chosen since the development of MKNN classifiers does not require any domain information, along these lines it is fitting for exploratory learning discovery. Also, it can deal with high dimensional data. In the MKNN, each sample in training set must be validated at the first step. The validity of each one point is found as per its neighbors. The validation procedure is performed for all train samples. To accept a sample point in the training set, the H nearest neighbors of the point is considered. Among the H nearest neighbors of a training test x, validity(x) enumerate the quantity of points with the same name to the label of x. The formula which is proposed to calculate the validity of every point in train set is (4.7) where H is the number of considered neighbors and lbl(x)returns the true class label of the sample x. also, Ni(x) stands for the i th nearest neighbor of the point x. The function S takes into account the similarity between the point x and the i th nearest neighbor. (4.8) Prediction Model The prediction model considers Opening value, High value, Low value and Closing value of the market index as independent variables and the next day s closing value as the dependent variable. The MKNN identifies k nearest neighbors in the training data set in terms of the Euclidean distance with respect to the day for which prediction is to be done. Once k nearest neighbors are identified, the prediction for that day is computed as the average of the next day s closing prices of those neighbors. The MKNN employs weighted KNN on the test
9 data set for predicting the next day s closing value. The output of the predictive model is compared with the actual values of the test dataset for validation. Applying weighted KNN Each of the K samples is given a weighted vote that is usually equal to some decreasing function of its distance from the unknown sample. For example, the vote might set be equal to 1/(de+1), where de is Euclidian distance. These weighted votes are then summed for each class, and the class with the largest total vote is chosen. This distance weighted KNN technique is very similar to the window technique for estimating density functions. For example, using a weighted of 1/ (de+1) is equivalent to the window technique with a window function of 1/ (de+1) if K is chosen equal to the total number of training samples. In the MKNN method, first the weight of each neighbor is computed using (4.9) Then, the validity of that training sample is multiplied on its raw weight which is based on the Euclidian distance. In the MKNN method, the weight of each neighbor sample is derived according to (4.10) Here v (i) and Val (i) stand for the weight and the validity of the i th nearest sample in the train set Classifier Model The classifier model considers opening value, high value, low value, closing value and returns of the market index as independent variables and the next day s class as the dependent variable. Returns for a day is calculated as (4.11)
10 Where v t is the closing value of the index on the current day and v t1 is the closing value of the index of previous day. If the next days return is positive, the next day s class is classified as bull otherwise bear.the yield of the classifier is compared with the real classes of the test data set to improve the effectiveness of the approach. 4.4 Empirical Results The examined data sample comprises of daily returns from January 2010 to December 2013 of three stock market indices, BSE oil and gas, CNX100 and CNXNIFTY. Data samples are collected from the historical values of NSE NIFTY and BSE (Bombay Stock Exchange) data. The total data set is split into two one for training the network and remaining for testing the performance of the network. In this experiment, the stock index data from January 2010 to December 2012 is used to train the network and the data from January 2013 to December 2013 is used to test the performance of the proposed approach Performance Measures The following performance measures are used to gauge the performance of the trained forecasting model for the test data: The Mean Squared Error (MSE), Root Mean Squared Error (RMSE), RSquared (R 2 ), Adjusted Rsquared (RA 2 ), HannanQuinn Information Criterion (HQ). Table 4.2 illustrates various performance measures that are used to evaluate the effectiveness of the proposed approach.
11 Table 4.2: Performance Criteria and the related formula Performance Criteria Mean Squared Error Formula Root Mean Squared Error (RMSE) RSquared(R 2 ) Adjusted RSquared(R A 2 ) = real value, = estimated value, = mean value HannanQuinn Information Criterion (HQ) SSR = Results Prediction Model Fig 4.4 presents the results for the returns (close price) for the year 2013 of the BSE Oil and Gas index obtained using Modified KNN (MKNN) and table 4.3 shows the error rate of the proposed approach using various performance measures.
12 Jan Feb Mar April May June July Aug Sep Oct Nov Dec Close Price Actual Predicted Fig 4.4 BSE Predicted Close Price Value Table 4.3 Error Rate of BSE Test Criteria Error Rate (%) Mean Squared Error 3.87 Root Mean Squared Error (RMSE) 5.98 RSquared(R 2 ) 0.35 Adjusted RSquared(R 2 A ) 1.67 HannanQuinn Information Criterion (HQ) 5.03
13 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Close Price Fig 4.5 presents the results for the returns (close price) for the year 2013 of the NSE CNX 100 index and table 4.4 shows the error rate of the MKNN approach using various performance measures Actual Predicted Fig 4.5 Predicted Close Price of CNX100 Stock Index Table 4.4 Error Rate of CNX100 Stock Index Test Criteria Error Rate (%) Mean Squared Error 3.67 Root Mean Squared Error (RMSE) 4.98 RSquared(R 2 ) 0.38 Adjusted RSquared(R 2 A ) 1.98 HannanQuinn Information Criterion (HQ) 5.03
14 Jan Feb Mar April May Jun Jul Aug Sep Oct Nov Dec Close Price Fig 4.6 presents the results for the returns (close price) for the year 2013 of the NSE CNX NIFTY index and table 4.5 shows the error rate of the proposed approach using various performance measures Actual Predicted Fig 4.6 Predicted Close Price of CNXNIFTY Stock Index Table 4.5 Error Rate of CNXNIFTY Stock Index Test Criteria Error Rate (%) Mean Squared Error 3.89 Root Mean Squared Error (RMSE) 4.43 RSquared(R 2 ) 0.45 Adjusted RSquared(R 2 A ) 1.78 HannanQuinn Information Criterion (HQ) 5.03
15 Classification Model The results obtained from the two classifying models for BSE oil and gas, CNX100 and CNX NIFTY are given below. Table 4.6 Comparison of Classifier Models on the Test Dataset for BSE Oil and Gas knn MKNN Instances Accuracy Instances Accuracy Correctly classified Incorrectly classified % % % % Table 4.6 shows that the MKNN rightly classifies the next day s index movement of BSE Oil and Gas Index for 294 instances out of the total of 331 instances with an accuracy rate of 88.8% and misclassifies 37instances with an error rate of 12.2%.But the KNN correctly classifies the next day s index movement only for 258 instances out of the total of 331 instances with an accuracy rate of 77.9% and misclassifies 73 instances with an error rate of 22.1% respectively. Table 4.7 Comparison of Classifier Models on the Test Dataset for CNX100 knn MKNN Instances Accuracy Instances Accuracy Correctly classified Incorrectly classified % % % % Table 4.7 shows that the MKNN rightly classifies the next day s index movement of CNX100 Index for 294 instances out of the total of 331 instances with an accuracy rate of 88.01% and misclassifies 41 instances with an error rate of 12.79% respectively.but the KNN correctly classifies the next day s index movement only for 254 instances out of the total of 331
16 instances with an accuracy rate of 76.89% and misclassifies 77 instances with an error rate of 22.87% respectively. Table 4.8Comparison of Classifier Models on the Test Dataset for CNXNIFTY Correctly classified Incorrectly classified knn MKNN Instances Accuracy Instances Accuracy % % % % Table 4.8 shows that the MKNN rightly classifies the next day s index movement of CNXNIFTY Index for 295 instances out of the total of 331 instances with an accuracy rate of 88.57% and misclassifies 36instances with an error rate of 11.98% respectively. But the KNN correctly classifies the next day s index movement only for 256 instances out of the total of 331 instances with an accuracy rate of 77.01% and misclassifies 75 instances with an error rate of 22.45% respectively. Table 4.9Confusion Matrices for BSE Oil and Gas KNN MKNN Algorithm Actual Class Predicted Class Predicted Class Bull Bear Bull Bear Bull Bear It is seen from the Table 4.9 that MKNN rightly classifies 150 bull class instances out of the total of 164 bull class instances and rightly classifies158 bear class instances out of the total of 167 bear class instances. But the KNN has lower performance compared to the MKNN model.
17 Table 4.10Confusion Matrices for CNX100 KNN MKNN Algorithm Actual Class Predicted Class Predicted Class Bull Bear Bull Bear Bull Bear Table 4.10 shows that MKNN rightly classifies 149 bull class instances out of the total of 164 bull class instances while KNN classifies only about 100 instances and MKNN correctly classifies158 bear class instances out of the total of 167 bear class instances which is about 145 instances in bear class. Table 4.11 Confusion Matrices for CNXNIFTY KNN MKNN Algorithm Actual Class Predicted Class Predicted Class Bull Bear Bull Bear Bull Bear Table 4.11 shows that MKNN rightly classifies 151 bull class instances out of the total of 164 bull class instances while KNN classifies only about 101 instances and MKNN correctly classifies156 bear class instances out of the total of 167 bear class instances which is about 146 instances in bear class.
6. Dicretization methods 6.1 The purpose of discretization
6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationI211: Information infrastructure II
Data Mining: Classifier Evaluation I211: Information infrastructure II 3nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1
More information1) Give decision trees to represent the following Boolean functions:
1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following
More informationA System to Automatically Index Genealogical Microfilm Titleboards Introduction Preprocessing Method Identification
A System to Automatically Index Genealogical Microfilm Titleboards Samuel James Pinson, Mark Pinson and William Barrett Department of Computer Science Brigham Young University Introduction Millions of
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10701/15781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification  KNN Classifier Majority
More informationFootball result prediction using simple classification algorithms, a comparison between knearest Neighbor and Linear Regression
EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15 HP STOCKHOLM, SVERIGE 2016 Football result prediction using simple classification algorithms, a comparison between knearest Neighbor and Linear Regression PIERRE
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationAnalyzing Outlier Detection Techniques with Hybrid Method
Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,
More informationUniversity of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationRoute Map (Start September 2012) Year 9
Route Map (Start September 2012) Year 9 3 th 7 th Sept 10 th 14 th Sept 17 th 21 st Sept 24 th 28 th Sept 1 st 5 th Oct 8 th 12 th Oct 15 th 19 th Oct 22 nd 26 th Oct 29 th Oct2 nd Nov 5 th 9 th
More informationGraphical Analysis of Data using Microsoft Excel [2016 Version]
Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.
More informationData Preprocessing. Data Preprocessing
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationTHE preceding chapters were all devoted to the analysis of images and signals which
Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationDATE OF BIRTH SORTING (DBSORT)
DATE OF BIRTH SORTING (DBSORT) Release 3.1 December 1997  ii  DBSORT Table of Contents 1 Changes Since Last Release... 1 2 Purpose... 3 3 Limitations... 5 3.1 Command Line Parameters... 5 4 Input...
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a priori. Classification: Classes are defined apriori Sometimes called supervised clustering Extract useful
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationWeb Data mininga Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSRJCE) eissn: 22780661, p ISSN: 22788727Volume 13, Issue 1 (Jul.  Aug. 2013), PP 2226 Web Data mininga Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationCS Programming I: Arrays
CS 200  Programming I: Arrays Marc Renault Department of Computer Sciences University of Wisconsin Madison Fall 2017 TopHat Sec 3 (PM) Join Code: 719946 TopHat Sec 4 (AM) Join Code: 891624 Array Basics
More informationData Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005
Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate
More informationUsing Excel for Graphical Analysis of Data
Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are
More informationConditional Formatting
Microsoft Excel 2013: Part 5 Conditional Formatting, Viewing, Sorting, Filtering Data, Tables and Creating Custom Lists Conditional Formatting This command can give you a visual analysis of your raw data
More informationData Mining: Classifier Evaluation. CSCIB490 Seminar in Computer Science (Data Mining)
Data Mining: Classifier Evaluation CSCIB490 Seminar in Computer Science (Data Mining) Predictor Evaluation 1. Question: how good is our algorithm? how will we estimate its performance? 2. Question: what
More informationData Mining: An experimental approach with WEKA on UCI Dataset
Data Mining: An experimental approach with WEKA on UCI Dataset Ajay Kumar Dept. of computer science Shivaji College University of Delhi, India Indranath Chatterjee Dept. of computer science Faculty of
More informationAutomatic Attendance System Based On Face Recognition
Automatic Attendance System Based On Face Recognition Sujay Patole 1, Yatin Vispute 2 B.E Student, Department of Electronics and Telecommunication, PVG s COET, Shivadarshan, Pune, India 1 B.E Student,
More informationEncoding Words into String Vectors for Word Categorization
Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,
More informationCOURSE LISTING. Courses Listed. Training for Database & Technology with Modeling in SAP HANA. 20 November 2017 (12:10 GMT) Beginner.
Training for Database & Technology with Modeling in SAP HANA Courses Listed Beginner HA100  SAP HANA Introduction Advanced HA300  SAP HANA Certification Exam C_HANAIMP_13  SAP Certified Application
More informationData Preprocessing. Why Data Preprocessing? MIT652 Data Mining Applications. Chapter 3: Data Preprocessing. MultiDimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationLinear Discriminant Analysis in Ottoman Alphabet Character Recognition
Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /
More informationIMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.ECommunication system, PREC, Thanjavur
IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS Kirthiga, M.ECommunication system, PREC, Thanjavur R.Kannan,Assistant professor,prec Abstract: Face Recognition is important
More informationApplication of knn and Naïve Bayes Algorithm in Banking and Insurance Domain
www.ijcsi.org https://doi.org/10.20943/01201605.6975 69 Application of knn and Naïve Bayes Algorithm in Banking and Insurance Domain Gourav Rahangdale 1, Mr. Manish Ahirwar 2 and Dr. Mahesh Motwani 3
More informationAn Initial Seed Selection Algorithm for Kmeans Clustering of Georeferenced Data to Improve
An Initial Seed Selection Algorithm for Kmeans Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application Fouad Khan Central European UniversityEnvironmental
More informationData Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM
Practice notes 4..9 Practice plan Data Mining and Knowledge Discovery Knowledge Discovery and Knowledge Management in escience Petra Kralj Novak Petra.Kralj.Novak@ijs.si Practice, 9//4 9//: Predictive
More informationPROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA
PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA ABSTRACT SAS does not have an option for PROC REG (or any of its other equation estimation procedures)
More informationTopic 7 Machine learning
CSE 103: Probability and statistics Winter 2010 Topic 7 Machine learning 7.1 Nearest neighbor classification 7.1.1 Digit recognition Countless pieces of mail pass through the postal service daily. A key
More informationNigerian Telecommunications Sector
Nigerian Telecommunications Sector SUMMARY REPORT: Q4 and full year 2015 NATIONAL BUREAU OF STATISTICS 26th April 2016 Telecommunications Data The telecommunications data used in this report were obtained
More informationClustering Analysis based on Data Mining Applications Xuedong Fan
Applied Mechanics and Materials Online: 203023 ISSN: 6627482, Vols. 303306, pp 026029 doi:0.4028/www.scientific.net/amm.303306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based
More informationKNearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing
KNearest Neighbour Classifier Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Reminder Supervised data mining Classification Decision Trees Izabela
More informationarxiv: v2 [cs.lg] 11 Sep 2015
A DEEP analysis of the METADES framework for dynamic selection of ensemble of classifiers Rafael M. O. Cruz a,, Robert Sabourin a, George D. C. Cavalcanti b a LIVIA, École de Technologie Supérieure, University
More informationBiometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)
Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) References: [1] http://homepages.inf.ed.ac.uk/rbf/hipr2/index.htm [2] http://www.cs.wisc.edu/~dyer/cs540/notes/vision.html
More informationData Preprocessing. Komate AMPHAWAN
Data Preprocessing Komate AMPHAWAN 1 Data cleaning (data cleansing) Attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in the data. 2 Missing value
More informationMetaData for Database Mining
MetaData for Database Mining John Cleary, Geoffrey Holmes, Sally Jo Cunningham, and Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand. Abstract: At present, a machine
More informationPreprocessing of Stream Data using Attribute Selection based on Survival of the Fittest
Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationSummary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19Nov10. Last week: This week:
Summary Data Warehousing & Data Mining WolfTilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tubs.de Last week: Logical Model: Cubes,
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D66123 Saarbrücken Germany NGFN  Courses in Practical DNA
More informationCOMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS
COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation
More informationFast Lookup: Hash tables
CSE 100: HASHING Operations: Find (key based look up) Insert Delete Fast Lookup: Hash tables Consider the 2sum problem: Given an unsorted array of N integers, find all pairs of elements that sum to a
More informationAccelerated Machine Learning Algorithms in Python
Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.618.12, 20.120.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationSYMBOLIC FEATURES IN NEURAL NETWORKS
SYMBOLIC FEATURES IN NEURAL NETWORKS Włodzisław Duch, Karol Grudziński and Grzegorz Stawski 1 Department of Computer Methods, Nicolaus Copernicus University ul. Grudziadzka 5, 87100 Toruń, Poland Abstract:
More informationWild Mushrooms Classification Edible or Poisonous
Wild Mushrooms Classification Edible or Poisonous Yulin Shen ECE 539 Project Report Professor: Hu Yu Hen 2013 Fall ( I allow code to be released in the public domain) pg. 1 Contents Introduction. 3 Practicality
More informationTheme Identification in RDF Graphs
Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published
More informationVECTOR SPACE CLASSIFICATION
VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei wwei@idi.ntnu.no Lecture
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More information2 Computation with FloatingPoint Numbers
2 Computation with FloatingPoint Numbers 2.1 FloatingPoint Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers
More informationDigital Image Processing. Prof. P.K. Biswas. Department of Electronics & Electrical Communication Engineering
Digital Image Processing Prof. P.K. Biswas Department of Electronics & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Image Segmentation  III Lecture  31 Hello, welcome
More informationA faster model selection criterion for OPELM and OPKNN: HannanQuinn criterion
A faster model selection criterion for OPELM and OPKNN: HannanQuinn criterion Yoan Miche 1,2 and Amaury Lendasse 1 1 Helsinki University of Technology  ICS Lab. Konemiehentie 2, 02015 TKK  Finland
More informationMobile Application with Optical Character Recognition Using Neural Network
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 1, January 2015,
More informationWEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1
WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 H. Altay Güvenir and Aynur Akkuş Department of Computer Engineering and Information Science Bilkent University, 06533, Ankara, Turkey
More informationTracking the Internet s BGP Table
Tracking the Internet s BGP Table Geoff Huston Telstra December 2000 Methodology! The BGP table monitor uses a router at the boundary of AS1221 which has a defaultfree ebgp routing table 1. Capture the
More informationVideo and Image Processing for Finding Paint Defects using BeagleBone Black
Video and Image Processing for Finding Paint Defects using BeagleBone Black Mr. Sohan Lokhande 1, Mr. P. T. Sasidharan 2. 1Student, Electronics Design and Technology, NIELIT, Aurangabad, Maharashtra, India.
More informationBayesian analysis of genetic population structure using BAPS: Exercises
Bayesian analysis of genetic population structure using BAPS: Exercises p S u k S u p u,s S, Jukka Corander Department of Mathematics, Åbo Akademi University, Finland Exercise 1: Clustering of groups of
More informationPredicting Bus Arrivals Using One Bus Away RealTime Data
Predicting Bus Arrivals Using One Bus Away RealTime Data 1 2 3 4 5 Catherine M. Baker Alexander C. Nied Department of Computer Science Department of Computer Science University of Washington University
More informationSpiegel Research 3.0 The Mobile App Story
Spiegel Research 3.0 The Mobile App Story The effects of adopting and using a brand s mobile application on purchase behaviors SU JUNG KIM THE PROJECT Smartphone penetration in the U.S. has reached 68
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47
More informationPenetrating the Matrix Justin Z. Smith, William Gui Zupko II, U.S. Census Bureau, Suitland, MD
Penetrating the Matrix Justin Z. Smith, William Gui Zupko II, U.S. Census Bureau, Suitland, MD ABSTRACT While working on a time series modeling problem, we needed to find the row and column that corresponded
More informationSection 6: Quadratic Equations and Functions Part 2
Section 6: Quadratic Equations and Functions Part 2 Topic 1: Observations from a Graph of a Quadratic Function... 147 Topic 2: Nature of the Solutions of Quadratic Equations and Functions... 150 Topic
More informationLecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40
Lecture 16 Hashing Hash table and hash function design Hash functions for integers and strings Collision resolution strategies: linear probing, double hashing, random hashing, separate chaining Hash table
More informationHPE Security Data Security. HPE SecureData. Product Lifecycle Status. End of Support Dates. Date: April 20, 2017 Version:
HPE Security Data Security HPE SecureData Product Lifecycle Status End of Support Dates Date: April 20, 2017 Version: 17041 Table of Contents Table of Contents... 2 Introduction... 3 HPE SecureData Appliance...
More informationImproved Performance of Unsupervised Method by Renovated KMeans
Improved Performance of Unsupervised Method by Renovated P.Ashok Research Scholar, Bharathiar University, Coimbatore Tamilnadu, India. ashokcutee@gmail.com Dr.G.M Kadhar Nawaz Department of Computer Application
More informationCHAPTER 3: Data Description
CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a
More informationFUZZY INFERENCE SYSTEMS
CHAPTERIV FUZZY INFERENCE SYSTEMS Fuzzy inference is the process of formulating the mapping from a given input to an output using fuzzy logic. The mapping then provides a basis from which decisions can
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More information3 Feature Selection & Feature Extraction
3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 MaxDependency, MaxRelevance, MinRedundancy 3.3.2 Relevance Filter 3.3.3 Redundancy
More informationI.A.C.  Italian Activity Contest.
I.A.C.  Italian Activity Contest. RULES FOR 2017 I.A.C. EDITION. Scope: Main goal of the I.A.C.  Italian Activity Contest is to promote, encourage and support the use of the HAM bands from 50 MHz to
More informationA New Approach to Discover Periodic Frequent Patterns
A New Approach to Discover Periodic Frequent Patterns Dr.K.Duraiswamy K.S.Rangasamy College of Terchnology, Tiruchengode 637 209, Tamilnadu, India Email: kduraiswamy@yahoo.co.in B.Jayanthi (Corresponding
More informationEfficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper
More informationParameter Selection for EM Clustering Using Information Criterion and PDDP
Parameter Selection for EM Clustering Using Information Criterion and PDDP Ujjwal Das Gupta,Vinay Menon and Uday Babbar Abstract This paper presents an algorithm to automatically determine the number of
More informationAvailable online at ScienceDirect. Procedia Computer Science 35 (2014 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 35 (2014 ) 388 396 18 th International Conference on KnowledgeBased and Intelligent Information & Engineering Systems
More informationUnsupervised Distributed Clustering
Unsupervised Distributed Clustering D. K. Tasoulis, M. N. Vrahatis, Department of Mathematics, University of Patras Artificial Intelligence Research Center (UPAIRC), University of Patras, GR 26110 Patras,
More informationClassification. Instructor: Wei Ding
Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute
More information6. Object Identification L AK S H M O U. E D U
6. Object Identification L AK S H M AN @ O U. E D U Objects Information extracted from spatial grids often need to be associated with objects not just an individual pixel Group of pixels that form a realworld
More informationPresentation title goes here
Financial Review Presentation title goes here Fritz Joussen Chief Operating Officer, Vodafone Germany Vodafone Germany Investor & Analyst Day 14 07 2005 1 Agenda Market Overview Commercial Strategy Financial
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationAIMMS Function Reference  Date Time Related Identifiers
AIMMS Function Reference  Date Time Related Identifiers This file contains only one chapter of the book. For a free download of the complete book in pdf format, please visit www.aimms.com Aimms 3.13 DateTime
More informationSAS Scalable Performance Data Server 4.3
Scalability Solution for SAS Dynamic Cluster Tables A SAS White Paper Table of Contents Introduction...1 Cluster Tables... 1 Dynamic Cluster Table Loading Benefits... 2 Commands for Creating and Undoing
More informationData warehouse architecture consists of the following interconnected layers:
Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and
More informationUse of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University
Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the
More informationExtensions of OneDimensional Graylevel Nonlinear Image Processing Filters to ThreeDimensional Color Space
Extensions of OneDimensional Graylevel Nonlinear Image Processing Filters to ThreeDimensional Color Space Orlando HERNANDEZ and Richard KNOWLES Department Electrical and Computer Engineering, The College
More informationRecognizing handdrawn images using shape context
Recognizing handdrawn images using shape context Gyozo Gidofalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract The objective
More informationFinite Element Analysis Prof. Dr. B. N. Rao Department of Civil Engineering Indian Institute of Technology, Madras. Lecture  36
Finite Element Analysis Prof. Dr. B. N. Rao Department of Civil Engineering Indian Institute of Technology, Madras Lecture  36 In last class, we have derived element equations for two d elasticity problems
More informationCOURSE LISTING. Courses Listed. with SAP Hybris Marketing Cloud. 24 January 2018 (23:53 GMT) HY760  SAP Hybris Marketing Cloud
with SAP Hybris Marketing Cloud Courses Listed HY760  SAP Hybris Marketing Cloud C_HYMC_1702  SAP Certified Technology Associate  SAP Hybris Marketing Cloud (1702) Implementation Page 1 of 12 All available
More information