Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process
|
|
- Morgan Booth
- 6 years ago
- Views:
Transcription
1 Vol.133 (Information Technology and Computer Science 2016), pp Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process Jaekwon Kim 1,1, Youngshin Han 2* and Jongsik Lee 1* 1 Dept. of Computer Science and Information Engineering, Inha University, South Korea {Jaekwon Kim and Jongsik Lee, jslee@inha.ac.kr 2 Dept. of Computer Engineering, Sungkyul University, South Korea {Youngshin Han, hanys@sungkyul.ac.kr Abstract. Fault detection prediction of FAB (wafer fabrication) process in semiconductor manufacturing process is possible that improve product quality and reliability in accordance with the classification performance. However, FAB process is sometimes due to a fault occurs. And mostly it occurs pass. Hence, data imbalance occurs in the pass/fail class. If the data imbalance occurs, prediction models are difficult to predict fail class because increases the bias of majority class (pass class). In this paper, we propose the SMOTE (Synthetic Minority Oversampling Technique) based over sampling method for solving problem of data imbalance. The proposed method solve the imbalance of the between pass and fail by oversampling the minority class of fail. In addition, by applying the fault detection prediction model to measure the performance. Keywords: Semiconductor manufacturing process, Fault detection prediction, Oversampling, SMOTE 1 Introduction Probe test is a step of classifying the pass/fail (regular / irregular) of the wafer after the FAB process finished.[1] Until now, the semiconductor manufacturing process predicts the semiconductor yield using FAB process and probe test. But, the manufacturing process has caused the read time and cost problem. Because the level of manufacturing technology increases and increased the number of chips constituting a wafer. Therefore, to predict the final test yield in the semiconductor industry requires a study to reduce the lead time and cost. Complex wafer manufacturing process can cause some defects, it may fail to produce products. Hence, semiconductor manufacturing process is necessary to fault detection and classification * Corresponding Author. Youngshin Han and Jongsik Lee. ISSN: ASTL Copyright 2016 SERSC
2 method of the manufacturing process. In other word, fault detection prediction model can be quickly predict the final product, improve the quality and reliability. [2] Resolution of the data imbalance to improve of classification accuracy of fault detection prediction model.[3] The semiconductor Manufacturing process due to the fault classes are small, It is causing the imbalance between pass and fail class of the final product. Therefore, prediction model needs a data sampling method that can solve the data imbalance. In general cause of the imbalance, depending on the degree of imbalance uses the method under-sampling or oversampling. However, if the dataset is unbalanced, and some of the classes have the overlapping record data. In this case, a great influence on the classification predicted in accordance with the amount of overlap and the degree of imbalance. Therefore, a way to solve the problem of overlap is required with the over-sampling method. In this paper, we propose a SMOTE (Synthetic Minority Over-sampling Technique) [4] based oversampling for data imbalance in semiconductor manufacturing process. The proposed method solves the imbalance between the classes to improve the accuracy of the prediction model in Fault detection process. This study utilizes SECOM dataset [5], and generates data preprocessing and prediction models. 2 Method In this paper, the SMOTE based sampling technique to improve the performance of the predictive model. SMOTE generates the new minority class data using KNN (Knearest neighbor), a method for balancing the minority class and majority class. Framework for generating fault detection prediction model is shown in Figure 1. Fig. 1. Framework The proposed framework consists of a 2 phase. The first phase is the preprocessing steps to configure the SECOM dataset classified as predictive models. The SECOM dataset using the data cleaning, Feature selection. Pre-processing include data cleaning and feature selection method using SECOM dataset. Divide the 80 Copyright 2016 SERSC
3 SECOM dataset into training set (70%), testing set (30%). Oversampling uses a SMOTE. SMOTE is 1:2 balance (minority class: 33.4%, majority class: 66.6%) and configured. The second phase is to generate the prediction models and evaluation. Training set by the prediction model creation and utilization, LR (Logistic Regression), ANN (Artificial Neural Network), DT (Decision Tree C.4.5), RF (Random Forest) to use. In order to evaluate the prediction models used the confusion matrix. The procedure for generate the fault prediction model including SMOTE based oversampling are as follows: Data cleaning 1) Count in each attribute not available data or missing values. If record set are missing more than 60%, then remove that attribute Feature selection 2) Apply the following PCA (Principal Component Analysis) based feature selection. Oversampling 3) To balance the pass/ fail used SMOTE based over sampling. SMOTE pseudo code is as shown in Table 1. Table 2. SMOTE pseudo code [5] Line Code Start for i <- 1 to 10(k-nearest neighbors for 10) Compute k-nearest neighbors, and save the indices in the number of attribute. end for while Choose a random number between 1 and k, call it nn. Choose one of the k-nearest neighbors of 10. for j <- 1 to number of attribute. dif = MinorityClassSample(attribute(nn)(i)) - MinorityClass Sample[i][j] gap = rand() // between 0 and 1 NewClassSample[newindex][j] = MinorityClassSample[i][j] + gap *dif end for newindex ++ end while End Prediction model build 5) Build a fault prediction model with LR, ANN, DT (C.4.5) and RF. 6) Using the confusion matrix compares the precision, recall (sensitivity) and F- measure. Confusion matrix as shown Figure 3. (TP: True Positive, FP: False Positive, FN: False Negative, TN: True Negative) Copyright 2016 SERSC 81
4 Fig. 3. Confusion matrix 3 Experimental We used SECOM dataset[6] for the experiment. SECOM dataset consists of 1557 record and 590 attribute. Fail class record is 104, and pass class record is (Balance: 6.77%; 93.23%). Through data cleaning, removing and 271 attribute more than 'NaN (not available)' missing values with 60% of the 590 attribute. And finally using the 309 attribute. Feature selection is used in the final 35 feature from feature 309 using the PCA. SECOM dataset the training set 70% (total 1099 record; pass: 1026, fail: 73), testing set 30% (total 468 record; pass 437, fail 31) was composed. And, using the training set generates a fault prediction mode (using LR, ANN, DT and RF). For comparison of Oversampling SMOTE 1: 2, SMOTE 1: 2 and RUS (Random Under-Sampling) [6] 1: 2 compares. Confusion matrix of the results are shown in Table 2. Results of each model are shown in Figure 4. Table 2. Confusion matrix result Sampling Method Prediction Model TP FP FN TN SMOTE 1:2 LR ANN DT RF RUS 1:2 LR ANN DT RF Copyright 2016 SERSC
5 Fig. 4. Performance measure Sensitive average of all models showed a higher RUS SMOTE 0.259, RUS The average of specificity is SMOTE 0.895, RUS Accuracy is the average of the SMOTE 0.854, RUS Precision is the average of the SMOTE 0.154, RUS The average of the F-measure is SMOTE 0.186, RUS SMOTE has high performance even more than the RUS. Although there are differences depending on the classification model, generally SMOTE is effective to configure the fault detection prediction model. Thus SMOTE based oversampling can be effectively used in semiconductor manufacturing process. 4 Conclusion Semiconductor manufacturing process are a lot of costs in accordance with the classification of the pass/fail. In this study, we propose a SMOTE (Synthetic Minority Over-sampling Technique) based on the over sampling to solve the data imbalance between pass and fail. The proposed method was used for SECOM dataset [6], the classification model used the LR, ANN, DT and RF. SMOTE based oversampling is to offer better performance than other models. To future studies should study the way to increase the accuracy of the classification predicted. Copyright 2016 SERSC 83
6 Acknowledgment. This work was funded by the Ministry of Science, ICT and Future Planning (NRF-2015R1C1A2A ). References 1. Kim, K.-H. and Baek, J.: A Prediction of Chip Quality using OPTICS(Ordering Points to Identify the Clustering Structure)-based Feature Extraction at the Cell Level, J. of the Korean Institute of Industrial Engineers, vol. 40, no. 3, pp (2014) 2. Kerdprasop, Kittisak, and Nittaya Kerdprasop: Feature selection and boosting techniques to improve fault detection accuracy in the semiconductor manufacturing process. Proc. of Inter. MultiConference of Engineers and Computer Scientists. vol. 1 (2011) 3. J. Liu, Q. Hu, and D. Yu: A comparative study on rough set based class imbalance learning. Knowledge-Based Systems, vol. 21, no. 8, pp (2008) 4. N. Chawla, K. Bowyer, L. Hall, and W. P. Kegelmeyer: SMOTE: synthetic minority oversampling technique. J. of Artificial Intelligence and Research, vol. 16, pp (2002) 5. SEmi COnductor Manufacturing. (2010) 6. Witten,I.H. and Frank,E: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementation, Morgan Kaufmann (2000) 84 Copyright 2016 SERSC
Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process
Vol.133 (Iformatio Techology ad Computer Sciece 016), pp.85-89 http://dx.doi.org/10.1457/astl.016. Euclidea Distace Based Feature Selectio for Fault Detectio Predictio Model i Semicoductor Maufacturig
More informationSupport Vector Machine with Restarting Genetic Algorithm for Classifying Imbalanced Data
Support Vector Machine with Restarting Genetic Algorithm for Classifying Imbalanced Data Keerachart Suksut, Kittisak Kerdprasop, and Nittaya Kerdprasop Abstract Algorithms for data classification are normally
More informationCombination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset
International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran
More informationLina Guzman, DIRECTV
Paper 3483-2015 Data sampling improvement by developing SMOTE technique in SAS Lina Guzman, DIRECTV ABSTRACT A common problem when developing classification models is the imbalance of classes in the classification
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More informationResearch on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,
More informationQuality prediction modeling for multistage manufacturing based on classification and association rule mining
Quality prediction modeling for multistage manufacturing based on classification and association rule mining Hung-An Kao 1,2, *, Yan-Shou Hsieh 1, Cheng-Hui Chen 1, and Jay Lee 2 1 Central Industry Research
More informationPARALLEL SELECTIVE SAMPLING USING RELEVANCE VECTOR MACHINE FOR IMBALANCE DATA M. Athitya Kumaraguru 1, Viji Vinod 2, N.
Volume 117 No. 20 2017, 873-879 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu PARALLEL SELECTIVE SAMPLING USING RELEVANCE VECTOR MACHINE FOR IMBALANCE
More informationIEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde
IEE 520 Data Mining Project Report Shilpa Madhavan Shinde Contents I. Dataset Description... 3 II. Data Classification... 3 III. Class Imbalance... 5 IV. Classification after Sampling... 5 V. Final Model...
More informationA Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing
A Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing Youngji Yoo, Seung Hwan Park, Daewoong An, Sung-Shick Shick Kim, Jun-Geol Baek Abstract The yield management
More informationSafe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem
Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap Department of Mathematics,
More informationUsing Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions
Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions Offer Sharabi, Yi Sun, Mark Robinson, Rod Adams, Rene te Boekhorst, Alistair G. Rust, Neil Davey University of
More informationECLT 5810 Evaluation of Classification Quality
ECLT 5810 Evaluation of Classification Quality Reference: Data Mining Practical Machine Learning Tools and Techniques, by I. Witten, E. Frank, and M. Hall, Morgan Kaufmann Testing and Error Error rate:
More informationA Feature Selection Method to Handle Imbalanced Data in Text Classification
A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University
More informationFeature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process
Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process KITTISAK KERDPRASOP and NITTAYA KERDPRASOP Data Engineering Research Unit, School of Computer Engineering, Suranaree
More informationIdentification of the correct hard-scatter vertex at the Large Hadron Collider
Identification of the correct hard-scatter vertex at the Large Hadron Collider Pratik Kumar, Neel Mani Singh pratikk@stanford.edu, neelmani@stanford.edu Under the guidance of Prof. Ariel Schwartzman( sch@slac.stanford.edu
More informationData Mining Classification: Alternative Techniques. Imbalanced Class Problem
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems
More informationIntrusion detection in computer networks through a hybrid approach of data mining and decision trees
WALIA journal 30(S1): 233237, 2014 Available online at www.waliaj.com ISSN 10263861 2014 WALIA Intrusion detection in computer networks through a hybrid approach of data mining and decision trees Tayebeh
More informationData Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)
Data Mining: Classifier Evaluation CSCI-B490 Seminar in Computer Science (Data Mining) Predictor Evaluation 1. Question: how good is our algorithm? how will we estimate its performance? 2. Question: what
More informationThe class imbalance problem
The class imbalance problem In the Caravan data, there is only 6% of positive samples among 5822 and we found KNN, Logistic model, and LDA cannot beat the naive classifier that labels every case negative,
More informationClassification of Imbalanced Data Using Synthetic Over-Sampling Techniques
University of California Los Angeles Classification of Imbalanced Data Using Synthetic Over-Sampling Techniques A thesis submitted in partial satisfaction of the requirements for the degree Master of Science
More informationPrediction of Student Performance using MTSD algorithm
Prediction of Student Performance using MTSD algorithm * M.Mohammed Imran, R.Swaathi, K.Vasuki, A.Manimegalai, A.Tamil arasan Dept of IT, Nandha Engineering College, Erode Email: swaathirajmohan@gmail.com
More informationLecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy
Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy Machine Learning Dr.Ammar Mohammed Nearest Neighbors Set of Stored Cases Atr1... AtrN Class A Store the training samples Use training samples
More informationI211: Information infrastructure II
Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1
More informationRobot localization method based on visual features and their geometric relationship
, pp.46-50 http://dx.doi.org/10.14257/astl.2015.85.11 Robot localization method based on visual features and their geometric relationship Sangyun Lee 1, Changkyung Eem 2, and Hyunki Hong 3 1 Department
More informationNETWORK FAULT DETECTION - A CASE FOR DATA MINING
NETWORK FAULT DETECTION - A CASE FOR DATA MINING Poonam Chaudhary & Vikram Singh Department of Computer Science Ch. Devi Lal University, Sirsa ABSTRACT: Parts of the general network fault management problem,
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationData Mining: STATISTICA
Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and
More informationA Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression
Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study
More informationModel s Performance Measures
Model s Performance Measures Evaluating the performance of a classifier Section 4.5 of course book. Taking into account misclassification costs Class imbalance problem Section 5.7 of course book. TNM033:
More informationDATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS
DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationPost-Classification Change Detection of High Resolution Satellite Images Using AdaBoost Classifier
, pp.34-38 http://dx.doi.org/10.14257/astl.2015.117.08 Post-Classification Change Detection of High Resolution Satellite Images Using AdaBoost Classifier Dong-Min Woo 1 and Viet Dung Do 1 1 Department
More informationNoise-based Feature Perturbation as a Selection Method for Microarray Data
Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationEster Bernadó-Mansilla. Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain
Learning Classifier Systems for Class Imbalance Problems Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain Aim Enhance the applicability
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More informationA Wrapper for Reweighting Training Instances for Handling Imbalanced Data Sets
A Wrapper for Reweighting Training Instances for Handling Imbalanced Data Sets M. Karagiannopoulos, D. Anyfantis, S. Kotsiantis and P. Pintelas Educational Software Development Laboratory Department of
More informationImpact of Encryption Techniques on Classification Algorithm for Privacy Preservation of Data
Impact of Encryption Techniques on Classification Algorithm for Privacy Preservation of Data Jharna Chopra 1, Sampada Satav 2 M.E. Scholar, CTA, SSGI, Bhilai, Chhattisgarh, India 1 Asst.Prof, CSE, SSGI,
More informationEM algorithm with GMM and Naive Bayesian to Implement Missing Values
, pp.1-5 http://dx.doi.org/10.14257/astl.2014.46.01 EM algorithm with GMM and aive Bayesian to Implement Missing Values Xi-Yu Zhou 1, Joon S. Lim 2 1 I.T. College Gachon University Seongnam, South Korea,
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationCHALLENGES IN HANDLING IMBALANCED BIG DATA: A SURVEY
CHALLENGES IN HANDLING IMBALANCED BIG DATA: A SURVEY B.S.Mounika Yadav 1, Sesha Bhargavi Velagaleti 2 1 Asst. Professor, IT Dept., Vasavi College of Engineering 2 Asst. Professor, IT Dept., G.Narayanamma
More informationTraining-Free, Generic Object Detection Using Locally Adaptive Regression Kernels
Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIENCE, VOL.32, NO.9, SEPTEMBER 2010 Hae Jong Seo, Student Member,
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationLVQ-SMOTE Learning Vector Quantization based Synthetic Minority Over sampling Technique for biomedical data
Nakamura et al. BioData Mining 2013, 6:16 BioData Mining RESEARCH Open Access LVQ-SMOTE Learning Vector Quantization based Synthetic Minority Over sampling Technique for biomedical data Munehiro Nakamura
More informationChuck Cartledge, PhD. 23 September 2017
Introduction K-Nearest Neighbors Na ıve Bayes Hands-on Q&A Conclusion References Files Misc. Big Data: Data Analysis Boot Camp Classification with K-Nearest Neighbors and Na ıve Bayes Chuck Cartledge,
More informationDesign of a Processing Structure of CNN Algorithm using Filter Buffers
, pp.37-41 http://dx.doi.org/10.14257/astl.2016.129.08 Design of a Processing Structure of CNN Algorithm using Filter Buffers Kwan-Ho Lee 1, Jun-Mo Jeong 2, Jong-Joon Park 3 1 Dept. of Electronics and
More informationMissing Value Imputation in Multi Attribute Data Set
Missing Value Imputation in Multi Attribute Data Set Minakshi Dr. Rajan Vohra Gimpy Department of computer science Head of Department of (CSE&I.T) Department of computer science PDMCE, Bahadurgarh, Haryana
More informationSYED ABDUL SAMAD RANDOM WALK OVERSAMPLING TECHNIQUE FOR MI- NORITY CLASS CLASSIFICATION
TAMPERE UNIVERSITY OF TECHNOLOGY SYED ABDUL SAMAD RANDOM WALK OVERSAMPLING TECHNIQUE FOR MI- NORITY CLASS CLASSIFICATION Master of Science Thesis Examiner: Prof. Tapio Elomaa Examiners and topic approved
More informationA Robust Hand Gesture Recognition Using Combined Moment Invariants in Hand Shape
, pp.89-94 http://dx.doi.org/10.14257/astl.2016.122.17 A Robust Hand Gesture Recognition Using Combined Moment Invariants in Hand Shape Seungmin Leem 1, Hyeonseok Jeong 1, Yonghwan Lee 2, Sungyoung Kim
More informationCS4491/CS 7265 BIG DATA ANALYTICS
CS4491/CS 7265 BIG DATA ANALYTICS EVALUATION * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Dr. Mingon Kang Computer Science, Kennesaw State University Evaluation for
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationEfficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points
Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Dr. T. VELMURUGAN Associate professor, PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai-600106,
More informationk-nn Disgnosing Breast Cancer
k-nn Disgnosing Breast Cancer Prof. Eric A. Suess February 4, 2019 Example Breast cancer screening allows the disease to be diagnosed and treated prior to it causing noticeable symptoms. The process of
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationDesign and Implementation of HTML5 based SVM for Integrating Runtime of Smart Devices and Web Environments
Vol.8, No.3 (2014), pp.223-234 http://dx.doi.org/10.14257/ijsh.2014.8.3.21 Design and Implementation of HTML5 based SVM for Integrating Runtime of Smart Devices and Web Environments Yunsik Son 1, Seman
More informationFeature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm
Ann. Data. Sci. (2015) 2(3):293 300 DOI 10.1007/s40745-015-0060-x Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm Li-min Du 1,2 Yang Xu 1 Hua Zhu 1 Received: 30 November
More informationClassifying Imbalanced Data Sets Using. Similarity Based Hierarchical Decomposition
Classifying Imbalanced Data Sets Using Similarity Based Hierarchical Decomposition Cigdem BEYAN (Corresponding author), Robert FISHER School of Informatics, University of Edinburgh, G.12 Informatics Forum,
More informationK- Nearest Neighbors(KNN) And Predictive Accuracy
Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.
More informationDetermination of the Parameter for Transformation of Local Geodetic System to the World Geodetic System using GNSS
Vol. (Architecture and Civil Engineering 2), pp.8-22 http://dx.doi.org/.42/astl.2..2 Determination of the Parameter for Transformation of Local Geodetic System to the World Geodetic System using GNSS Joon
More informationB-kNN to Improve the Efficiency of knn
Dhrgam AL Kafaf, Dae-Kyoo Kim and Lunjin Lu Dept. of Computer Science & Engineering, Oakland University, Rochester, MI 809, U.S.A. Keywords: Abstract: Efficiency, knn, k Nearest Neighbor. The knn algorithm
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationA study of classification algorithms using Rapidminer
Volume 119 No. 12 2018, 15977-15988 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A study of classification algorithms using Rapidminer Dr.J.Arunadevi 1, S.Ramya 2, M.Ramesh Raja
More informationK-Neighbor Over-Sampling with Cleaning Data: A New Approach to Improve Classification. Performance in Data Sets with Class Imbalance
Applied Mathematical Sciences, Vol. 12, 2018, no. 10, 449-460 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2018.8231 K-Neighbor ver-sampling with Cleaning Data: A New Approach to Improve Classification
More informationRacing for Unbalanced Methods Selection
Racing for Unbalanced Methods Selection Andrea Dal Pozzolo, Olivier Caelen, and Gianluca Bontempi Abstract State-of-the-art classification algorithms suffer when the data is skewed towards one class. This
More informationAn Expert System for Detection of Breast Cancer Using Data Preprocessing and Bayesian Network
Vol. 34, September, 211 An Expert System for Detection of Breast Cancer Using Data Preprocessing and Bayesian Network Amir Fallahi, Shahram Jafari * School of Electrical and Computer Engineering, Shiraz
More informationThe Data Mining Application Based on WEKA: Geographical Original of Music
Management Science and Engineering Vol. 10, No. 4, 2016, pp. 36-46 DOI:10.3968/8997 ISSN 1913-0341 [Print] ISSN 1913-035X [Online] www.cscanada.net www.cscanada.org The Data Mining Application Based on
More informationSOFTWARE DEFECT PREDICTION USING IMPROVED SUPPORT VECTOR MACHINE CLASSIFIER
International Journal of Mechanical Engineering and Technology (IJMET) Volume 7, Issue 5, September October 2016, pp.417 421, Article ID: IJMET_07_05_041 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=7&itype=5
More information2. On classification and related tasks
2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationPredicting Bias in Machine Learned Classifiers Using Clustering
Predicting Bias in Machine Learned Classifiers Using Clustering Robert Thomson 1, Elie Alhajjar 1, Joshua Irwin 2, and Travis Russell 1 1 United States Military Academy, West Point NY 10996, USA {Robert.Thomson,Elie.Alhajjar,Travis.Russell}@usma.edu
More information(JBE Vol. 23, No. 6, November 2018) Detection of Frame Deletion Using Convolutional Neural Network. Abstract
(JBE Vol. 23, No. 6, November 2018) (Regular Paper) 23 6, 2018 11 (JBE Vol. 23, No. 6, November 2018) https://doi.org/10.5909/jbe.2018.23.6.886 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) CNN a), a),
More informationOutline. Prepare the data Classification and regression Clustering Association rules Graphic user interface
Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1 Prepare the Data Statistica can read from Excel,.txt and many other
More informationSensor-based Semantic-level Human Activity Recognition using Temporal Classification
Sensor-based Semantic-level Human Activity Recognition using Temporal Classification Weixuan Gao gaow@stanford.edu Chuanwei Ruan chuanwei@stanford.edu Rui Xu ray1993@stanford.edu I. INTRODUCTION Human
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationNearest Neighbor Classification with Locally Weighted Distance for Imbalanced Data
International Journal of Computer and Communication Engineering, Vol 3, No 2, March 2014 Nearest Neighbor Classification with Locally Weighted Distance for Imbalanced Data Zahra Haizadeh, Mohammad Taheri,
More informationStudy on the Signboard Region Detection in Natural Image
, pp.179-184 http://dx.doi.org/10.14257/astl.2016.140.34 Study on the Signboard Region Detection in Natural Image Daeyeong Lim 1, Youngbaik Kim 2, Incheol Park 1, Jihoon seung 1, Kilto Chong 1,* 1 1567
More informationSeminars of Software and Services for the Information Society
DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society
More informationData Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy
Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department
More informationSubject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.
Subject Copy paste feature into the diagram. When we define the data analysis process into Tanagra, it is possible to copy components (or entire branches of components) towards another location into the
More informationClassification of weld flaws with imbalanced class data
Available online at www.sciencedirect.com Expert Systems with Applications Expert Systems with Applications 35 (2008) 1041 1052 www.elsevier.com/locate/eswa Classification of weld flaws with imbalanced
More informationTutorial on Machine Learning Tools
Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow
More informationFast and Effective Spam Sender Detection with Granular SVM on Highly Imbalanced Mail Server Behavior Data
Fast and Effective Spam Sender Detection with Granular SVM on Highly Imbalanced Mail Server Behavior Data (Invited Paper) Yuchun Tang Sven Krasser Paul Judge Secure Computing Corporation 4800 North Point
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More informationFraud Detection Using Random Forest Algorithm
Fraud Detection Using Random Forest Algorithm Eesha Goel Computer Science Engineering and Technology, GZSCCET, Bhatinda, India eesha1992@rediffmail.com Abhilasha Computer Science Engineering and Technology,
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationECE 5470 Classification, Machine Learning, and Neural Network Review
ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationSNS College of Technology, Coimbatore, India
Support Vector Machine: An efficient classifier for Method Level Bug Prediction using Information Gain 1 M.Vaijayanthi and 2 M. Nithya, 1,2 Assistant Professor, Department of Computer Science and Engineering,
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationA Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence
2nd International Conference on Electronics, Network and Computer Engineering (ICENCE 206) A Network Intrusion Detection System Architecture Based on Snort and Computational Intelligence Tao Liu, a, Da
More information