Data Mining For Multi-Criteria Energy Predictions

Similar documents
Support Vector Machines

Smoothing Spline ANOVA for variable screening

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Parallelism for Nested Loops with Non-uniform and Flow Dependences

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

Support Vector Machines

Classifier Swarms for Human Detection in Infrared Imagery

Cluster Analysis of Electrical Behavior

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Optimizing SVR using Local Best PSO for Software Effort Estimation

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Classifying Acoustic Transient Signals Using Artificial Intelligence

Complexity Analysis of Problem-Dimension Using PSO

Classification / Regression Support Vector Machines

Improving Classifier Fusion Using Particle Swarm Optimization

S1 Note. Basis functions.

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Classifier Selection Based on Data Complexity Measures *

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Edge Detection in Noisy Images Using the Support Vector Machines

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

APPLICATION OF A COMPUTATIONALLY EFFICIENT GEOSTATISTICAL APPROACH TO CHARACTERIZING VARIABLY SPACED WATER-TABLE DATA

Feature Reduction and Selection

Air Transport Demand. Ta-Hui Yang Associate Professor Department of Logistics Management National Kaohsiung First Univ. of Sci. & Tech.

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Optimal Design of Nonlinear Fuzzy Model by Means of Independent Fuzzy Scatter Partition

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Machine Learning 9. week

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Using Neural Networks and Support Vector Machines in Data Mining

A Notable Swarm Approach to Evolve Neural Network for Classification in Data Mining

Programming in Fortran 90 : 2017/2018

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

TN348: Openlab Module - Colocalization

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

SVM-based Learning for Multiple Model Estimation

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

GSLM Operations Research II Fall 13/14

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

A generalized multiobjective particle swarm optimization solver for spreadsheet models: application to water quality

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Network Intrusion Detection Based on PSO-SVM

X- Chart Using ANOM Approach

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b

Meta-heuristics for Multidimensional Knapsack Problems

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

Optimizing Document Scoring for Query Retrieval

Data Mining: Model Evaluation

An Improved Particle Swarm Optimization for Feature Selection

Related-Mode Attacks on CTR Encryption Mode

A Binarization Algorithm specialized on Document Images and Photos

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Adaptive Virtual Support Vector Machine for the Reliability Analysis of High-Dimensional Problems

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Relevance Assignment and Fusion of Multiple Learning Methods Applied to Remote Sensing Image Analysis

Wishing you all a Total Quality New Year!

PRÉSENTATIONS DE PROJETS

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Multiobjective fuzzy optimization method

Multi-objective Optimization Using Adaptive Explicit Non-Dominated Region Sampling

Research of Neural Network Classifier Based on FCM and PSO for Breast Cancer Classification

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

NGPM -- A NSGA-II Program in Matlab

MULTIOBJECTIVE OPTIMIZATION USING PARALLEL VECTOR EVALUATED PARTICLE SWARM OPTIMIZATION

The Research of Support Vector Machine in Agricultural Data Classification

Hierarchical clustering for gene expression data analysis

Biostatistics 615/815

Parallel matrix-vector multiplication

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

A Statistical Model Selection Strategy Applied to Neural Networks

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

An Entropy-Based Approach to Integrated Information Needs Assessment

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Problem Definitions and Evaluation Criteria for the CEC 2015 Competition on Learning-based Real-Parameter Single Objective Optimization

Tuning of Fuzzy Inference Systems Through Unconstrained Optimization Techniques

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

The Codesign Challenge

Parameters Optimization of SVM Based on Improved FOA and Its Application in Fault Diagnosis

Training ANFIS Structure with Modified PSO Algorithm

IMAGE FUSION TECHNIQUES

Lecture 5: Multilayer Perceptrons

Artificial Intelligence (AI) methods are concerned with. Artificial Intelligence Techniques for Steam Generator Modelling

3. CR parameters and Multi-Objective Fitness Function

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Transcription:

Data Mnng For Mult-Crtera Energy Predctons Kashf Gll and Denns Moon Abstract We present a data mnng technque for mult-crtera predctons of wnd energy. A mult-crtera (MC) evolutonary computng method has been appled for the optmzaton of an Artfcal Intellgence learnng methodology Support Vector Machnes (SVM). The mult-crtera SVM method s appled and tested on a dataset wthn North Amerca, for predctons of wnd energy usng clmate varables. The SVM tranng employs Swarm Intellgence method for mult-crtera optmzaton. The Natonal Center for Envronmental Predcton (NCEP) s global reanalyss grdded dataset has been employed n ths study. The grdded dataset for ths partcular applcaton conssts of 4- ponts each consstng of fve varables. In order to study the mpact of hgher dmensons on the performance of SVM, Prncpal Component Analyss (PCA) s appled on the nput data to reduce the dmensonalty of the data. The results of mult-crtera SVM for the predcton of wnd energy are reported wth and wthout the pre-processng usng PCA. Index Terms Mult-Crtera optmzaton, Swarm Intellgence, Evolutonary computng, Prncpal Component Analyss, Support Vector Machnes, Wnd energy. I. INTRODUCTION The current paper descrbes a method for tranng support vector machne (SVM) n applcatons for wnd energy predctons at a wnd-farm level. The proposed methodology employs a Mult-Objectve Evolutonary Optmzaton approach for tranng the SVM. The SVM s a powerful learnng algorthm developed by Vapnk prmarly for classfcaton problems and later was extended to deal wth regresson [4 and 5]. The method s well-suted for the operatonal predctons and forecastng of wnd power, whch s an mportant varable for power utlty companes. The Mult-Objectve optmzaton approach dffers from sngle objectve n that the objectve to be optmzed s now a vector consstng of more than one objectve. The current mult-objectve methodology employs Swarm Intellgence based evolutonary computng mult-objectve strategy Manuscrpt receved August, 9. Kashf Gll and Denns Moon are wth WndLogcs, Inc., St Paul, MN 558 USA (correspondng author: 65-556-489; fax: 65-556-4; e-mal: kashf.gll@wndlogcs.com). called Mult-objectve Partcle Swarm Optmzaton (MOPSO) [6]. The PSO method has been developed for sngle objectve optmzaton by R. C. Eberhart and J. Kennedy []. It has been later extended to solve mult-objectve problems by varous researchers ncludng the method by [6]. The method orgnates from the swarm paradgm, called partcle swarm, and s expected to provde the so-called global or near-global optmum. PSO s characterzed by an adaptve algorthm based on a socal-psychologcal metaphor [] nvolvng ndvduals who are nteractng wth one another n a socal world. Ths sococogntve vew can be effectvely appled to computatonally ntellgent systems [8]. The governng factor n PSO s that the ndvduals, or partcles, keep track of ther best postons n the search space thus far obtaned, and also the best postons obtaned by ther neghborng partcles. The best poston of an ndvdual partcle s called local best, and the best of the postons obtaned by all the partcles s called the global best. Hence the global best s what all the partcles tend to follow. The algorthmc detals on PSO can be found n [,, 3, and 6]. The approach n [6] presents a multobjectve framework for SVM optmzaton usng MOPSO. The multobjectve approach to the PSO algorthm s mplemented by usng the concept of Pareto ranks and defnng the Pareto front on the objectve functon space. Mathematcally, a Pareto optmal front s defned as follows: A decson vector x S s called Pareto optmal f there does not exst another x S that domnates t. Let m P R be a set of vectors. The Pareto optmal front P * P contans all vectors x P, whch are not domnated by any vector x P : * P = x P / x P : x x () { } In the MOPSO algorthm, as devsed here [6], the partcles wll follow the nearest neghborng member of the Pareto front based on the proxmty n the objectve functon (soluton) space. At the same tme, the partcles n the front wll follow the best ndvdual n the front, whch s the medan (mddle partcle) of the Pareto front. The

term follow means assgnments done for each partcle n the populaton set to decde the drecton and offset (velocty) n the subsequent teraton. These assgnments are done based on the proxmty n the objectve functon or soluton space. The best ndvdual s defned n a relatve sense and may change from teraton to teraton dependng upon the value of objectve functon. The unque formulaton of MOPSO helps t to avod gettng struck n local optma, when makng a search n the mult-dmensonal parameter doman. In the current research, the MOPSO s used to parameterze the three parameters of SVM namely; the trade-off or cost parameter C, the epslon ε, and the kernel wdth γ. The MOPSO method uses a populaton of parameter sets to compete aganst each other through a number of teratons n order to mprove values of specfed mult-objectve crtera (objectve functons) e.g., root mean square error (RMSE), bas, hstogram error (BnRMSE), correlaton, etc. The optmum parameter search s conducted n an ntellgent manner by narrowng the desred regons of nterest and avods gettng struck n local optma. In our earler efforts, a sngle objectve optmzaton methodology has been employed for optmzaton of three SVM parameters. The approach was tested on a number of stes and results were encouragng. However, t has been notced that usng a sngle objectve optmzaton method can result n sub-optmal predctons when lookng at multple objectves. The sngle objectve formulaton (usng PSO) employed coeffcent of determnaton (COD ), as the only objectve functon, but t was notced that the resultng dstrbutons were hghly dstorted when compared to the observed dstrbutons. The coeffcent of determnaton (COD) s lnearly related to RMSE and can range between nf to ; the value of beng a perfect ft. COD n = = n = ( O P ) ( O O ) where O stands for observed, P stands for predcted, O stands for mean of observed, and s the ndex that goes from to the length of tme seres n. It s shown n Fgure where a trade-off curve s presented between BnRMSE vs. COD. It can be notced that COD value ncreases wth the ncrease n BnRMSE value. The correspondng hstograms are also shown n Fgure for each of the extreme ends (maxmum COD and mnmum BnRMSE) and the best compromse soluton from the curve. The hstograms shown n Fgure make t clear that the best COD (or RMSE) s the one wth hghest BnRMSE and ndeed msses the extreme ends of the dstrbuton. Thus no matter how temptng t s to acheve best COD value t does not cover the extreme ends of the dstrbuton. On the other hand the best BnRMSE comes at the cost of lowest COD (or hghest RMSE) and s not desred ether. Thus t s requred to have a mult-objectve scheme that smultaneously mnmze these objectves and provde a trade-off surface and therefore a compromse soluton can be chosen between the two objectves. Fgure also shows the hstogram for the best compromse soluton whch provdes a decent dstrbuton when compared to observed data. O D C.55.54.53.5.5.5 BnRMS vs COD Best BnRMS Best COD Compromse.49 3 35 4 45 5 55 6 65 7 BnRMS Fgure : Trade-off curve between BnRMSE and COD II. MATERIAL AND METHOD The current procedures prmarly employ SVM for buldng regresson models for assessng and forecastng wnd resources. The prmary nputs to the SVM come from the Natonal Center for Envronmental Predcton (NCEP) s reanalyss grdded data [7] centered on the wnd farm locaton. The target s the measurements of wnd farm aggregate power. The current tranng uses a k-fold cross valdaton scheme referred to as Round-Robn strategy. The dea wthn the Round-Robn s to dvde the avalable tranng data nto two sets; use one for tranng and hold the other for testng the model. In ths partcular Round-Robn strategy data s dvded nto months. The tranng s done on all the months except one, and the testng s done on the hold-out month. The current operatonal methods employ manual calbraton for the SVM parameters n assessment projects and a smple grd-based parameter search n forecastng applcatons. The goal n usng MOPSO s to explore the regons of nterest wth respect to the specfc mult-objectve crtera n an effcent way. Another attractve feature of MOPSO s that t results n a so-called Pareto parameter space, whch accounts for parameter uncertanty between the two objectve functons. Thus the result s an ensemble of parameter sets cluttered around the so called global optmum w.r.t. the mult-objectve space. Ths ensemble of parameter sets also gves tradeoffs on dfferent objectve crtera. The MOPSO-SVM method s tested on the data from an operatonal assessment ste n North Amerca. The results are compared wth observed data usng a number of evaluaton crtera on the valdaton sets. As stated above, the data from 4 NCEP s grd ponts each consstng of 5 varables (total varables) s used.

5 Best BnRMS Compromse Best COD 5 ts C oun 5 4 6 8 Power (kw) Fgure : Hstogram comparson between, BnRMS best, COD best, and the best compromse soluton x 4 Fgure 3: Trade-off curve between the two objectves BnRMS vs. RMSE for SVM and PCA-SVM The problem therefore has a dmensonalty of and can pose a dffcult task for SVM. In order to study ths mpact, Prncpal Component Analyss (PCA) s appled and the PCs explanng 95% of the varance are ncluded as nputs to SVM. The comparson s made wth SVM that does not use PCA as pre-processng step. MOPSO requre a populaton consstng of parameter sets to be evolved through a number of teratons competng aganst each other to obtan an optmum (mnmum n ths case) value for the BnRMSE and RMSE. In the current formulaton, a 5 member populaton s evolved for teratons wthn MOPSO for wnd power predctons at the wnd farm. III. RESULTS AND DISCUSSION The MOPSO-SVM results are shown wth and wthout PCA pre-processng. Ths wnd farm ste s located n Canada and has 9 months of energy data avalable. The MOPSO s used to tran SVM over the avalable 7 months of tranng data (@ 6-hourly tme resoluton) usng a Round-Robn cross-valdaton strategy. Ths gves an opportunty to tran on 6 months and test the results on a hold-out one month test set. By repeatng the process for all the 7 months, gves a full 7 months of test data to compare aganst the observed. Snce there s 9 months of data avalable for ths ste, a full one- year of data s used n valdaton (completely unseen data). The results that follow are the predctons on the valdaton set. The results are shown on the normalzed (between - and ) dataset.

.3.. d te c P red -. -. -.3 -.4 -.5 -.6 SVM PCA-SVM -.7 -.7 -.6 -.5 -.4 -.3 -. -....3 Fgure 4: Scatter plot for the mean monthly wnd energy data for SVM and PCA-SVM 4 35 3 PCA SVM SVM 5 5 5 - -.8 -.6 -.4 -...4.6.8 Power (kw) Fgure 5: Hstogram for SVM and PCA-SVM along wth observed The trade-off curve for MOPSO SVM optmzaton for the two objectves s shown n Fgure 3. The trade-off between BnRMS vs. RMSE s shown for the SVM usng orgnal nput compared aganst SVM usng Prncpal Components (PCs) as nputs. A best ft lne has also been shown for the two curves. It can be notced that there s lttle dfference between the two approaches. The PCA-SVM produced a better objectve functon result for BnRMS, where as smple SVM provded a better objectve functon result for RMSE. Fgure 4 shows the monthly mean wnd power for the months compared aganst the observed data. The results are shown for SVM predcton wth and wthout the pre-processng usng PCA. As stated above, there s a very lttle dfference between the two approaches and a good ft has been found. It can be notced that predctons are n reasonable agreement wth the observed data. The results n Fgure 5 show hstogram of observed vs. the predcted wnd power data at the 6-hourly tme resoluton (the predcton tme step). The results are shown for SVM predcton wth and wthout the pre-processng usng PCA. It can be notced that the dstrbutons are well-mantaned usng MOPSO methodology and a reasonable agreement between observed and predcted power s evdent from the fgure. A number of goodness-of-ft measures are evaluated n Table, whch are monthly root mean square error (RMSE), monthly coeffcent of determnaton (COD), nstantaneous RMSE, nstantaneous COD, and BnRMSE (hstogram bn RMSE). The results n Table are presented for SVM predcton wth and wthout the pre-processng usng PCA. Both monthly and nstantaneous wnd power are of sgnfcant nterest, and thus are ncluded n the current analyss. It can be notced that not only monthly but also nstantaneous power predctons are n close agreement wth the observed.

Table : Wnd energy goodness-of-ft MC-SVM Goodness measure SVM PCA-SVM Monthly RMSE.53.64 Monthly COD.954.934 Instantaneous RMSE.38.38 Instantaneous COD.735.734 BnRMSE 37.758 36.49 IV. CONCLUSIONS In the present paper, a mult-objectve evolutonary computng method MOPSO s used to optmze the three parameters of SVM for wnd energy predctons. The approach has been tested on data from a wnd farm usng NCEP s re-analyss grd data. The predcton strategy employs SVM wth and wthout the pre-processng usng PCA. A number of graphcal and tabular results n the form of goodness-of-ft measures are presented for wnd energy predctons. The SVM predctons at the wnd farm level produced excellent agreement wth the observed data for the valdaton set. The results for the two approaches are qute smlar but SVM wthout any pre-processng usng PCA produced slghtly better results. Overall, the results have been encouragng and t s recommended to use MOPSO-SVM approach for other operatonal projects n the area of wnd power predctons and forecastng. Whle further modfcatons and advancements are underway, the current procedure s sound enough to be appled n operatonal settngs. REFERENCES [] Eberhart, R. C., and J. Kennedy, A new optmzer usng partcle swarm theory, n Proceedngs of the Sxth Internatonal Symposum on Mcro Machne and Human Scence, 995, MHS 95, 995, pp. 39 43, do:.9/mhs.995.4945, IEEE Press, Pscataway, N. J. [] Kennedy, J., and R. C. Eberhart, Partcle swarm optmzaton, n Proceedngs of IEEE Internatonal Conference on Neural Networks, IV, vol. 4, 995, pp. 94 948, do:.9/icnn.995.488968, IEEE Press, Pscataway, N. J. [3] Eberhart, R. C., R. W. Dobbns, and P. Smpson, Computatonal Intellgence PC Tools, Elsever, New York, 996. [4] Vapnk, V., The Nature of Statstcal Learnng Theory, Sprnger, New York, 995. [5] Vapnk, V., Statstcal Learnng Theory, John Wley, Hoboken, N. J., 998. [6] Gll, M. K., Y. H. Kahel, A. Khall, M. McKee, and L. Bastdas, Multobjectve partcle swarm optmzaton for parameter estmaton n hydrology, Water Resour. Res., 4, 6, W747, do:.9/5wr458. [7] Kalnay et al.,the NCEP/NCAR 4-year reanalyss project, Bull. Amer. Meteor. Soc., 77, 996, 437-47. [8] Kennedy, J., R. C. Eberhart, and Y. Sh, (). Swarm Intellgence.Morgan Kaufmann, San Francsco, CA.