Using Support Vector Machines for Direct Marketing Models

Size: px

Start display at page:

Download "Using Support Vector Machines for Direct Marketing Models"

Ambrose Allen
5 years ago
Views:

1 Iteratioal Joural of Egieerig ad Advaced Techology (IJEAT) ISSN: , Volume-4 Issue-4, April 2015 Usig Support Vector Machies for Direct Marketig Models A. Nachev, T. Teodosiev Abstract This paper presets a case study of data miig modelig for direct marketig, based o support vector machies. We address some gaps i previous studies, amely: dealig with radomess ad 'lucky' set compositio; role of variable selectio, data saturatio, ad cotrollig the problem of uder-fittig ad over-fittig; ad selectio of kerel fuctio ad model hyper-parameters for optimal performace. I order to avoid overestimatio of the model performace, we applied a double-testig procedure, which combies cross-validatio, ad multiple rus. To illustrate the poits discussed, we built predictive models, which outperform those discussed i previous studies. Idex Terms classificatio, data miig, direct marketig, support vector machies. I. INTRODUCTION Today, baks are faced with various challeges offerig products ad service to their customers, such as icreasig competitio, cotiually risig marketig costs, decreased respose rates, ad at the same time ot havig a direct relatioship with their customers. I order to address these problems, baks aim to select those customers who are most likely to be potetial buyers of the ew product or service ad make a direct relatioship with them. I simple words, baks wat to select the customers who should be cotacted i the ext marketig campaigs. Respose modelig is usually formulated as a biary classificatio problem. The customers are divided ito two classes, respodets ad o-respodets. Various classificatio methods (classifiers) have bee used for respose modelig such as statistical ad machie learig methods. They use historical purchase data to trai ad the idetify customers who are likely to respod by purchasig a product. May data miig ad machie learig techiques have bee ivolved to build decisio support models capable of predictig the likelihood if a customer will respod to the offerig or ot. These models ca perform well or ot that well depedig o may factors, a importat of which is how traiig of the model has bee plaed ad executed. Recetly, eural etworks have bee studied i [9], [10], [11], [12], ad regarded as a efficiet modelig techique. Decisio trees have bee explored i [10] ad [11]. Support vector machies are also well performig models discussed i [9], [12], ad [13]. May other modelig techiques ad approaches, both statistical ad machie learig, have bee studied ad used i the domai. Mauscript Received o April A. Nachev, BIS, Caires Busiess School, NUI Galway, Galway, Irelad. T. Teodosiev, Departmet of Computer Sciece, Shume Uiversity, Shume, Bulgaria. This paper focuses to support vector machies as a modelig techique ad discuss factors, which affect their performace ad capabilities to predict. We exted the methodology used i [9], [10], ad [11], addressig certai gaps, which ifluece model performace. The remaider of the paper is orgaized as follows: sectio II provides a overview of the data miig techique used; sectio III discusses the dataset used i the study, its features, ad the preprocessig steps eeded to prepare the data for experimets; sectio IV presets ad discuses the experimetal results; ad sectio V gives coclusios. II. SUPPORT VECTOR MACHINES Support vector machies are commo machie learig techiques. They belog to the family of geeralized liear models, which achieve a classificatio or regressio decisio based o the value of the liear combiatio of iput features. Usig historical data alog with supervised learig algorithms, SVM geerate mathematical fuctios to map iput variables to desired outputs for classificatio or regressio predictio problems. SVM, origially itroduced by Vapik [1], provide a ew approach to the problem of patter recogitio with clear coectios to the uderlyig statistical learig theory. They differ radically from comparable approaches such as eural etworks because SVM traiig always fids a global miimum i cotrast to the eural etworks. SVM ca be formalized as follows: Traiig data is a set of poits of the form D = {(x i, c i ) x i R p,c i { 1,1}} i=1 where the c i is either 1 or -1, idicatig the class to which the poit x i belogs. Each data poit x i is a p-dimesioal real vector. Durig traiig a liear SVM costructs a p-1-dimesioal hyper-plae that separates the poits ito two classes (Fig. 1). Ay hyper-plae ca be represeted by w x b = 0, where w is a ormal vector ad deotes dot product. Amog all possible hyper-plaes that might classify the data, SVM selects oe with maximal distace (margi) to the earest data poits (support vectors). Whe the classes are ot liearly separable (there is o hyperplae that ca split the two classes), a variat of SVM, called soft-margi SVM, chooses a hyperplae that splits the poits as clealy as possible, while still maximizig the distace to the earest clealy split examples. The method itroduces slack variables, ξ i, which measure the degree of misclassificatio of the datum x i. Soft-margi SVM pealizes misclassificatio errors ad employs a parameter (the soft-margi cost costat C) to cotrol the cost of misclassificatio. Traiig a liear SVM classier solves the costraied optimizatio problem: (1) 183

2 Usig Support Vector Machies for Direct Marketig Models mi w,b,ξk 1 s.t. 2 w 2 + C w x + b 1 ξ i I dual form the optimizatio problem ca be represeted by 1 mi αi 2 i=1 j=1 α j y i y j x i x j s.t. 0 C, c i = 0 The resultig decisio fuctio f (x) = w x + b has a weight vector w = k=1 a k y k x k i=1 > 0 are called support vectors, si. Data poits x i for which defie the maximum margi hyperplae. Maximizig the margi allows oe to miimize bouds o geeralizatio error. If every dot product is replaced by a o-liear kerel fuctio, it trasforms the feature space ito a higher-dimesioal oe, thus though the classifier is a hyperplae i the high-dimesioal feature space (Fig. 2). The resultig classifier fits the maximum-margii the trasformed feature space. The kerel fuctio ca be hyperplae defied: k(x i, x j ) = Φ(x i )Φ(x ) j (4) where Φ(x) maps the vector x to some other Euclidea space. The dot product x i x j i the f replaced by k(x i, x j ) so that the S problem i its dual form ca be redefiedd as: maximize (i ) ɶL(α) = 1 2 i i j α j y i y j k(x i, x j ) s.t. y i = 0; 0; 1 i N i A o-liear SVM is largely characterized by the choice of its kerel, ad SVMs thus lik the problems they are desiged for with a large body of existig work o kerel-based methods. Some commo kerels fuctios iclude: Liear kerel: k(x, x') = x x' Polyomial kerel: k(x, x') = (scale x x'+ offs Gaussia RBF kerel: k(x, x') = exp( σ x x' 2 ) (8) Hyperbolic taget kerel: k(x, x') = tah(scale x x'+ + offset) (9) Laplacia kerel: ξ i=1 i (2) k(x, x') = exp( σ x x' ) (10) i=1 (3) ce they uiquely formulae above is SVM optimizatio (5) (6) set) degree (7) Figure 1. Maximum-margi hyperplae for a SVM traied with samples from two classes. Samples o the margi are support vectors. The choice of kerel strogly depeds o the task specifics ad is usually made after empirical survey. The kerel parameters appear hyper-parameters for the model ad their tuig is a impotrat for the classifier performace. The SVM s major advatage lies with their ability to map variables oto a extremely high feature space. Because the size of the margi does ot deped o the data dimesio, SVM are robust with respect to data with high iput dimesio, however, it has bee discovered they do ot favor large datasets, due to the demads imposed o virtual memory, ad the traiig complexity resultat from the use of such a scaled collectio of data [2]. Figure 2. Kerel fuctio: a liearly iseparable iput space ca be mapped to a liearly separable higher-dimmesioal space. Work from Fei, Li, ad Yog [3] highlighted three crucial problems i the use of support vector machies. These are: attaiig the optimal iput subset, correct kerel fuctio, 184

3 Iteratioal Joural of Egieerig ad Advaced Techology (IJEAT) ISSN: , Volume-4 Issue-4, April 2015 ad the optimal parameters of the selected kerel, all of which are prime cosideratios withi this study. III. DATA A. Dataset The direct marketig dataset used i this study was provided by Moro, Laureao, ad Cortez [9], also available i [8]. It cosists of 45,211 samples, each havig 17 attributes, oe of which is the class label. The attributes are both categorical ad umeric ad ca be grouped as: demographical (age, educatio, job, marital status); bak iformatio (balace, prior defaults, loas); direct marketig campaig iformatio (cotact type, duratio, days sice last cotact, outcome of the prior campaig for that cliet, etc.) The dataset is ubalaced, because the successful samples correspodig to the class 'yes' are 5,289, which is 11.7% of all samples. There are o missig values. Further details about data collectio, uderstadig, ad iitial preprocessig steps ca be foud i [9]. With referece to the stadard for data miig projects CRISP-DM [4], we did two data pre-processig trasformatios: mappig o-umeric data ito biary dummies ad ormalizatio. No-umeric categorical variables were decomposed ito a series of dummy biary variables. For example, a sigle variable, such as educatio havig possible values of "ukow", "primary", "secodary", ad "tertiary" would be decomposed ito four separate variables: ukow - 0/1; primary - 0/1; secodary - 0/1; ad tertiary - 0/1. This is a full set of dummy variables, which umber correspods to the umber of possible values. However, i this example oly three of the dummy variables are eed - if values of three are kow, the fourth is also kow. For example, give that these four values are the oly possible oes, we ca kow that if the educatio is either ukow, primary, or secodary, it must be tertiary. Thus we map a categorical variable ito dummies, which are oe less tha the umber of possible values. Usig reduced umber of dummies we coverted the origial dataset variables ito 42 umeric variables altogether, which is 6 less tha the 48 variables used i [10] ad [11]. There are two beefits of that: first, the model becomes simpler ad faster; secodly, avoidig redudacy i data aliviates the SVM problem with demads imposed o virtual memory, ad the traiig complexity with huge umber of support vectors. The model buildig utility we used coverts categorical variables to biary dummies without redudacy. The secod data trasformatio we did is related to ormalizatio/scalig. This procedure attempts to have all iput variables x a with cosistet values, regardless of their origial scale of ad/or differet measuremet uits used, e.g. day (1-31) vs. duratio i secods (0-4918). If the data are left as they are, the traiig process gets iflueced ad biased by some domiatig variables with large values. I order to address this, we did ormalizatio (z-scorig) by: x a,i ew = x µ a,i σ i {1N} (11) where µ is the mea ad σ is the stadard deviatio of x a. After the trasformatio, each iput variable has zero mea ad uit stadard deviatio. B. Variable Importace Referrig to the data preparatio stage of the CRISP-DM project model for data miig [4], we explore how presece or absece of the iput variables preseted to the model for traiig ad testig affects the classifier performace. Removig most irrelevat ad redudat variables from the data may help to alleviate the effect of the curse of dimesioality, ehace the model geeralizatio capabilities, speed up the learig process, ad to improve the model iterpretability. The variable selectio also helps to acquire better uderstadig about data ad how they are related with each other. This work uses Sesitivity Aalysis (SA) for rakig the variable importace to the model by measurig the effects o the output whe the iputs are varied through their rage of values [5]. While iitially proposed for eural ets, SA is curretly used with virtually ay supervised learig techique, such as SVM [6]. The SA varies a iput variable x a through its rage with L levels, uder a regular sequece from the miimum to the maximum value. Let x a, j deotes the j-th level of iput x a. Let ŷ deote the value predicted by the model for oe data sample (x) ad let ŷ = P(x) is the fuctio of model resposes. Kewley, Embrechts, ad Breema propose i [7] three sesitivity measures, amely rage ( S r ), gradiet ( S g ) ad variace ( S v ): S r = max(ŷ aj : j {1,, L}) mi(ŷ aj : j {1,, L}) L S g = ŷ aj ŷ a j 1 / (L 1) j=2 L S v = (ŷ aj y a ) 2 / (L 1) j=2 (12) where y a deotes the mea of the resposes. The gradiet is the oly measure that is depedet o the order of the sesitivity resposes. For all measures, the higher the value, the more relevat is the iput x a. The relative importace r a ca be give by: r a = ς a / (13) M ς i=1 i where ς a is the sesitivity measure for x a (e.g., rage) [5]. IV. EXPERIMENTS AND DISCUSSION I order to explore the SVM performace for task outlied ad compare the model characteristics with those discussed i studies [9], [10], [11], we used the same dataset ad did experimets cosistetly. Further to that, we exteded the methodology addressig the followig gaps: Validatio ad testig. Usig validatio ad test sets i a double-testig procedure helps to avoid overestimatio of the model performace. Radomess ad 'lucky' set compositio. Radom samplig is a fair way to select traiig ad testig sets, but some 'lucky' draws ca trai the model much better tha others. Usig rigorous testig ad validatio procedures we solidify the coclusios made. 185

4 Usig Support Vector Machies for Direct Marketig Models Choice of kerel fuctio. We explore the SVM performace usig the five of the most commom kerel fuctios discussed above. Optimizatio of the model hyper-parameters. We tested the SVM performace with differe hyper-parameters, some of which are specific for the kerel fuctio used. Variable selectio. Further to idetifyig importace of variables ad their cotributio to the classificatio task o the basis of SA, we applied backward selectio procedure to elimiate some iput variables. Data saturatio. We also explored the capacity of the SVM to act i early stages of data collectio where lack of sufficiet data may lead to uderfitted models. All experimets were coducted usig R eviromet [15], [16], ad [17]. I order to select iput variables for elimiatio, we did SA usig three sesitivity measures: rage, gradiet, ad variace by 10 rus of the model per measure. Fig. 3-5 show the iput variable importace, usig a bar plot for each r a i equatio (13), sorted i descedig order. The whiskers i the figures represet cofidece itervals. Two of the measures, rage ad variace, fid loa as the least sigificat iput, while the gradiet measure fids default the oe. Ayway, both iput variables show similar isigificace to the classificatio task. Applyig backward variable selectio procedure by elimiatig first loa, we re-evaluated the iput sigificaces ad further elimiated cotact ad campaig to obtai best results. For the sake of cosistecy with the previous studies, we first used 98% of the origial dataset, which was further split radomly ito traiig ad validatio sets i ratio 2:1. The rest of 2% was used for fial ad idepedet test set. Usig test set i additio to the validatio set solidifies the performace estimatio as the validatio set specifics ca ifluece the search for best hyper- parameters values. Thus, estimatio based o validatio set oly ca get biased. I order to provide more realistic performace results ad reduce the effect of lucky set compositio, each versio of the model was ru 10 times with differet radomly selected traiig ad validatio sets. For each ru, a 3-fold cross-validatio creates 3 model istaces ad averages their results. We iterated all those procedures 10 times per model, recordig ad averagig accuracy ad AUC. Aother part of our experimets was to test how differet levels of data saturatio affect the SVM model performace. I a realistic situatio, buildig a dataset ca be a ogoig process, startig with a small dataset, which grow gradually over the time duratio pdays poutcome age balace previous campaig moth educatio job day marital housig cotact default loa Figure 3. Iput importaces usig rage sesitivity measure duratio pdays poutcome age balace previous moth campaig housig educatio marital job cotact day default loa Figure 4. Iput importaces usig variace sesitivity measure duratio pdays poutcome age balace previous moth campaig educatio job day housig marital cotact loa default Figure 5. Iput importaces usig gradiet sesitivity measure. 186

5 Iteratioal Joural of Egieerig ad Advaced Techology (IJEAT) ISSN: , Volume-4 Issue-4, April 2015 Performace of a classifier traied at differet stages of the dataset lifetime is a importat characteristic, as some modellig techiques may show better results tha other i differet data saturatios. Table 1 summarizes the SVM performace i terms of accuracy ad AUC with differet levels of data saturatio, ragig from 10% to 98% of the origial dataset. Results show that the 20% saturatio yields best average accuracy of % with some lucky sets achievig %. The table also shows a droppig performace whe data saturatio gets higher / lower. This ca be iterpreted as havig the size icreasig / decreasig makes the model to over-fit / uder-fit to the traiig set. The best model here outperforms the best models reported i previous studies [10], [11] with 90.09% max accuracy. Table 1. SVM performace with fractios of the origial dataset used for traiig. Merit 98%set 80%set 60%set 40%set 20%set 10%set ACC% AUC I data miig, classificatio performace is ofte measured usig accuracy as the figure of merit. For a give operatig poit of a classifier, the accuracy is the total umber of correctly classified istaces divided by the total umber of all available istaces. Accuracy, however, varies dramatically depedig o class prevalece. It ca be a misleadig estimator i cases where the most importat class is typically uderrepreseted, such as the class of 'yes' of those who respod positively to the direct marketig. For these applicatios, sesitivity ad specificity ca be more relevat performace estimators. I order to address the accuracy deficiecies, we did Receiver Operatig Characteristics (ROC) aalysis [14]. I a ROC curve, the true positive rate (TPR), a.k.a. sesitivity, is plotted as a fuctio of the false positive rate (FPR), a.k.a. 1-specificity, for differet cut-off poits. Each poit o the ROC plot represets a sesitivity/specificity pair correspodig to a particular decisio threshold. A test with perfect discrimiatio betwee the two classes has a ROC plot that passes through the upper left corer (100% sesitivity, 100% specificity). Therefore the closer the ROC plot is to the upper left corer, the higher the overall accuracy of the test. The area uder the ROC curve (AUC) is a commo measure for the evaluatio of discrimiative power. AUC represets classifier performace over all possible threshold values, i.e. it is threshold idepedet. We used the best performig 20% dataset for traiig ad validatio, iterally split i ito traiig ad validatio sets i ratio 2:1. The fit algorithm rus 10 times with differet radom selectio of traiig ad validatio sets. For each of those set compositios, the 3-fold cross-validatio creates 3 model istaces ad average results. Fig. 6 shows the results by 10 colored lies ad a tick black curve, which is average of the 10 curves. Stadard deviatio bars, aalogous to the whiskers, depict the variace of TPR. True positive rate ROC of SVM traied by 20% of Data False positive rate SVM AUC=0.891 baselie Figure 6. ROC curves of 10 SVM models. Black lie represets average performace. Stadard deviatio bars measure variace. Lift is aother metric, ofte used to measure performace of marketig models. A good performace is whe the respose withi the target is much better tha average for the populatio as a whole. I a cumulative lift chart (gais chart), the y-axis shows the percetage of true positive resposes (TPR). Formally, TPR = sesitivity = TP / (TP + FN) (14) where TP ad FN are true positive ad false egative predictios, respectively. Fig. 7 shows the cumulative lift charts of the 10 SVM models, ru uder the ROC aalysis. The colors ad whiskers i the curves have the same purpose as above. Aother way to characterize performace of a classifier is to look at how precisio ad recall chage as threshold chages. This ca be visualized by precisio-recall curve (Fig. 8). The better the classifier, the closer its curve is to the top-right corer of the graph. Formally, precisio = TP / (TP + FP) (15) recall = TP / (TP + FN) (16) I terms of a direct marketig task, precisio is the percet of correctly idetified 'yes' customers (who purchase the product) amog all reported as 'yes'; recall is the percet of correctly idetified 'yes' customers amog those who are 'yes' i the test set. Recall ad precisio are iversely related: as recall icreases, precisio decreases ad visa versa. Aother factor that affects the SVM performace is the choice of kerel fuctio ad selectig proper values for its parameters, which alog with the misclassificatio cost C, hyper-parameters of the model. We explored empirically the SVM performace with the five kerels outlied i equatios (6)-(10)

6 Usig Support Vector Machies for Direct Marketig Models Cumulative LIFT of SVM True positive rate SVM ALIFT= Rate of positive predictios Figure 7. Cumulative LIFT curves of 10 SVM models. Black lie represets the average. Stadard deviatio bars measure variace. Figure 9. SVM performace with Gaussia RBF kerel ad hyper-parameters C ad sigma (σ). Table 2. SVM optimal hyper-parameters usig differet kerel fuctios. Precisio - Recall of SVM Precisio SVM Hyperparameter liear RBF poly tah Laplacia C sigma /a /a /a degree /a /a 2 /a /a scale /a /a /a offset /a /a /a ACC max % Fially, we built Variable Effect Characteristic (VEC) curves [6] to explore the average impact of the four most importat iput variables x a, which plot the x aj values (x-axis) versus Recall Figure 8. Precisio-Recall curves of 10 SVM models. Black lie represets the average. Stadard deviatio bars measure variace. Table 2 summarizes outcomes. We foud that Gaussia RBF is the best performig kerel i two sets of hyper-parameter values: C=3, sigma=0.089; C=3.5, sigma=0.091, both yeidig max ACC=91.108%. Fig. 9 illustrates grid search for optimal Gaussia RBF hyper-parameters i a rage where the SVM provides the best discrimiatory power. The highest two peaks correspod to the max accuracy obtaied, while multiple rages of both C ad sigma obtai a very good accuracy above 90.7% outperformig the SVM models discussed i the previous studies. The traiig set here is the 20% of the origial oe ad the three iput variables loa, cotact, ad campaig were elimiated. the ŷ aj resposes (y-axis). Betwee two cosecutive x aj values, the VEC plot uses a lie (iterpolatio) for cotiuous values ad a horizotal segmet for categorical data. We ru the model 10 times ad plotted the average values vertically. The whiskers added represet the cofidece itervals. Fig show how duratio, pdays, poutcome, ad age cotribute the model performace. From the duratio VEC is evidet that the last cotact with shortest ad logest duratio cotribute mostly to the positive outcome, whilst a moderate duratio, typically about 2000 sec cotributes to a egative outcome. Similarly, the pdays VEC shows that the sooer the customer is cotacted after the last cotact, the better. The gap betwee the cotacts ca be exteded up to oe year, but ay over-delayed cotact is useless ad cotributes to egative outcome. 188

7 Iteratioal Joural of Egieerig ad Advaced Techology (IJEAT) ISSN: , Volume-4 Issue-4, April Figure 10. VEC curve for the 'duratio' iput Figure 11. VEC curve for the 'pdays' iput. success ukow failure other Figure 12. VEC curve for the 'poutcome' iput. I relatio to the poutcome iput, the VEC curve shows that customers who purchased the product or service are ot likely to purchase it agai ad should t be ivolved i the ew direct marketig campaig, but there is a high chace to sell the product to ay other customers. Fially, the age VEC curve shows that the marketig campaig is better to target mid-age customers betwee 40-50; there is a egligible chace to sell the product to elderly people, particularly above 70. Figure 13. VEC curve for the 'age' iput. V. CONCLUSION This paper presets a case study of data miig modelig techiques for direct marketig. We address some issues which we fid as gaps i previous studies, amely: The most commo partitioig procedure for traiig, validatio, ad test sets uses radom samplig. Although, this is a fair way to select a sample, some 'lucky' draws trai the model much better tha others. Thus, the model istaces show variace i behavior ad characteristics, iflueced by the radomess. I order to address this issue ad further to [9], [10], [11], we used a methodology, which combies cross-validatio (CV), multiple rus over radom selectio of the folds, ad multiple rus over radom selectio of partitios. Each model was tested may times ivolvig 3-fold cross-validatio, radom partitioig ad iteratios. We also applied double-testig procedure with both validatio ad test sets. Aother cotributio is exploratio of the SVM with differet kerels ad differet values of hyper-parameters. The empirical results show that the best performig kerel is the Gaussia RBF with C=3, sigma=0.089; C=3.5, sigma=0.091, both yeidig max ACC=91.108%. We also aalysed how SVM performs with differet levels of data saturatio ad foud that the 20% dataset is best for traiig. We also did aalysis o how iput variable importace affects the model performace ad foud that elimiatig three iputs improve the SVM discrimiatory power. Importace metrics were based o sesitivity aalysis. I coclusio, we believe that a rigorous model aalysis, ivolvig the issues discussed i the paper, lead to solid ad better results. REFERENCES [1] V. Vapik, The Nature of Statistical Learig Theory, Spriger, New York, [2] S. Horg, M. Su, Y. Che, T. Kao, R. Che, J. Lai, ad C. Perkasa, A ovel itrusio detectio system based o hierarchical clusterig ad support vector machies, Expert Systems with Applicatios, vol.38, 2010, pp [3] L. Fei, W. Li, ad H. Yog, Applicatio of least squares support vector machies for discrimiatio of red wie usig visible ad ear ifrared spectroscopy, Itelliget System ad Kowledge Egieerig, vol. 1, 2008, pp [4] P. Chapma, J. Clito, R. Kerber, T. Khabaza, T. Reiartz, C. Shearer, ad R. Wirth, CRISP-DM Step-by-step data miig guide, CRISP-DM Cosortium,

8 Usig Support Vector Machies for Direct Marketig Models [5] P. Cortez, M. Embrechts. Usig sesitivity aalysis ad visualizatio techiques to ope black box data miig models. Iformatio Scieces vol. 225, 2013, pp [6] P. Cortez, A. Cerdeira, F. Almeida, T. Matos, ad J. Reis, Modelig wie prefereces by data miig from physicochemical properties, Decisio Support Systems, vol. 47, o. 4, 2009, pp [7] R. Kewley, M. Embrechts, C. Breema Data strip miig for the virtual desig of pharmaceuticals with eural etworks, IEEE Trasactios o Neural Networks, vol. 11 (3), 2000, pp [8] A. Asucio ad D. Newma, UCI Machie Learig Repository, Uiv. of Califoria Irvie," [Olie], Available: edu/ mlear/mlrepository.html. [9] S. Moro, R. Laureao, P Cortez, "Usig Data Miig for Bak Direct Marketig: A Applicatio of the CRISP-DM Methodology," P. Novais (Ed.), Proceedigs of the Europea Simulatio ad Modellig Coferece - ESM'2011, 2011, pp [10] H. Elsalamoy ad A. Elsayad, Bak Direct Marketig Based o Neural Network, Iteratioal Joural of Egieerig ad Advaced Techology, vol. 2 o. 6, 2013, pp [11] H. Elsalamoy, Bak Direct Marketig Aalysis of Data Miig Techiques, Iteratioal Joural of Computer Applicatios, vol. 85 o. 7, 2014, pp [12] E. Yu ad S. Cho, Costructig respose model usig esemble based o feature subset selectio, Expert Systems with Applicatios, vol. 30 o. 2, 2006, pp [13] H. Shi ad S. Cho, Respose modelig with support vector machies, Expert Systems with Applicatios, vol. 30 o. 4, 2006, pp [14] T. Fawcett, A itroductio to ROC aalysis, Patter Recogitio Letters, vol. 27, o.8, 2005, pp [15] P. Cortez, Data Miig with Neural Networks ad Support Vector Machies usig the R/rmier Tool. Proceedigs of the 10th Idustrial Coferece o Data Miig, Spriger, LNAI 6171, 2010, pp [16] R Developmet Core Team. R: A laguage ad eviromet for statistical computig. R Foudatio for Statistical Computig, [Olie]. Available: [17] T. Sig, O. Sader, N. Beerewikel, ad T. Legauer, ROCR: visualizig classifier performace i R, Bioiformatics vol. 21, o. 20, 2005, pp

Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process

Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process Vol.133 (Iformatio Techology ad Computer Sciece 016), pp.85-89 http://dx.doi.org/10.1457/astl.016. Euclidea Distace Based Feature Selectio for Fault Detectio Predictio Model i Semicoductor Maufacturig