Non-Linear Continuum Regression Using Genetic Programming

Similar documents
Lecture 4: Principal components

Feature Reduction and Selection

X- Chart Using ANOM Approach

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Smoothing Spline ANOVA for variable screening

S1 Note. Basis functions.

Support Vector Machines

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Programming in Fortran 90 : 2017/2018

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

A Binarization Algorithm specialized on Document Images and Photos

CS 534: Computer Vision Model Fitting

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

A Semi-parametric Regression Model to Estimate Variability of NO 2

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Wishing you all a Total Quality New Year!

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

An Entropy-Based Approach to Integrated Information Needs Assessment

Parallelism for Nested Loops with Non-uniform and Flow Dependences

A Statistical Model Selection Strategy Applied to Neural Networks

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

TN348: Openlab Module - Colocalization

Cluster Analysis of Electrical Behavior

Graph-based Clustering

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Unsupervised Learning

y and the total sum of

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

A Robust LS-SVM Regression

An Optimal Algorithm for Prufer Codes *

Biostatistics 615/815

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Structure from Motion

Meta-heuristics for Multidimensional Knapsack Problems

Parallel matrix-vector multiplication

Classifier Selection Based on Data Complexity Measures *

The Impact of Delayed Acknowledgement on E-TCP Performance In Wireless networks

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Control strategies for network efficiency and resilience with route choice

A Workflow for Spatial Uncertainty Quantification using Distances and Kernels

Detection, isolation and reconstruction of faulty sensors using principal component analysis +

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Support Vector Machines. CS534 - Machine Learning

Review of approximation techniques

Appearance-based Statistical Methods for Face Recognition

Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids Verification. General Terms Algorithms

Solving two-person zero-sum game by Matlab

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Mathematics 256 a course in differential equations for engineering students

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)


Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

Principal Component Inversion

A Robust Method for Estimating the Fundamental Matrix

EVALUATION OF THE PERFORMANCES OF ARTIFICIAL BEE COLONY AND INVASIVE WEED OPTIMIZATION ALGORITHMS ON THE MODIFIED BENCHMARK FUNCTIONS

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

LECTURE : MANIFOLD LEARNING

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Multi-dimensional Fault Diagnosis Using a Subspace Approach. Ricardo Dunia and S. Joe Qin. University of Texas at Austin.

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

The Codesign Challenge

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Support Vector Machines

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

SVM-based Learning for Multiple Model Estimation

Adjustment methods for differential measurement errors in multimode surveys

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Analysis of Continuous Beams in General

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Using the Correlation Criterion to Position and Shape RBF Units for Incremental Modelling

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Edge Detection in Noisy Images Using the Support Vector Machines

Air Transport Demand. Ta-Hui Yang Associate Professor Department of Logistics Management National Kaohsiung First Univ. of Sci. & Tech.

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

Classifying Acoustic Transient Signals Using Artificial Intelligence

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Array transposition in CUDA shared memory

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Artificial Intelligence (AI) methods are concerned with. Artificial Intelligence Techniques for Steam Generator Modelling

Detection of an Object by using Principal Component Analysis

Data Mining For Multi-Criteria Energy Predictions

A fault tree analysis strategy using binary decision diagrams

Optimal Workload-based Weighted Wavelet Synopses

A Multivariate Analysis of Static Code Attributes for Defect Prediction

INF Repetition Anne Solberg INF

Transcription:

Non-Lnear Contnuum Regresson Usng Genetc Programmng Ben McKay Ben.McKay@ncl.ac.uk Mark Wlls Domnc Searson Mark.Wlls@ncl.ac.uk D.P.Searson@ncl.ac.uk Dept. Chemcal & Process Engng. Unversty of Necastle NE 7RU, UK http:\\loren.ncl.ac.uk\control\ Gary Montague Gary.Montague@ncl.ac.uk Abstract In ths contrbuton, genetc programmng s combned th contnuum regresson to produce to novel non-lnear contnuum regresson algorthms. he frst s a sequental algorthm hle the second, adopts a team-based strategy. Havng dscussed contnuum regresson, the modfcatons requred to extend the algorthm for non-lnear modellng are outlned. he results of applyng the derved non-lnear algorthms to the development of an nferental model of a food extruson process are then presented. he superor performance of the sequental algorthm, as compared to a smlar non-lnear partal least squares algorthm, s demonstrated. In addton, these results clearly demonstrate that the team-based strategy sgnfcantly outperforms the sequental approach. INRODUCION In ths paper Genetc Programmng (GP) s used to develop to non-lnear contnuum regresson algorthms for buldng nput-output models of chemcal process systems. A team-based strategy s nvestgated, and compared to a more tradtonal sequental approach. Why are chemcal engneers nterested n data-based modellng? Recent years have seen an ncrease n the emphass on product 'qualty', economc process performance, envronmental and safety ssues n the process ndustres, and these factors have placed sgnfcant demands on exstng operatonal procedures. Process montorng, optmsaton and advanced control have the potental to satsfy many of these demands. Hoever, n order to realse the benefts of these technques, t s generally necessary to have an accurate model of the process. Whle t may be possble to develop such a model from frstprncples (usng a detaled knoledge of the physcs and chemstry of a system), there are a number of drabacks to ths approach. As many ndustral process systems are extremely complex and relatvely poorly understood, the development of a realstc model can take a consderable amount of tme and effort. In addton, the modellng process nevtably nvolves a number of smplfyng assumptons that have to be made n order to provde a tractable soluton. herefore, a mechanstc model ll often be costly to develop and may be subject to naccuraces. Consequently, data-based modellng (usng plant data to buld an nput-output model that descrbes the response of process outputs to changes n nputs, thout attemptng to represent the underlyng process mechansms) presents a popular alternatve. Why use non-lnear models? Many chemcal process systems are non-lnear n nature. herefore, n order to acheve suffcent model accuracy, a non-lnear model structure s typcally requred. Why not use a standard GP algorthm? he performance of a standard GP algorthm hen used to develop non-lnear models of chemcal process systems (usng ether smulated or ndustral nput-output data) can be dsappontng hen compared to other non-lnear regresson technques such as feedforard neural netorks (e.g. see McKay, 997, Hden, 998). Generally, a GP algorthm s much sloer than compettve technques (especally for ncreasng numbers of nput varables) and the predcton errors on unseen (test) data are typcally hgher, ndcatng a less accurate model. Why the nterest n Contnuum Regresson? Our recent ork has shon that strateges that reduce the algorthm s search space can help the overall performance of a GP algorthm (hen used for model development). A number of multvarate statstcal modellng technques provde systematc procedures for the decomposton of an nput-output relatonshp. By extractng and structurng nformaton n an approprate manner, these technques can be used to reduce the effectve search space. In Hden et al. (998) t as demonstrated that the combnaton of GP th one such technque, knon as Partal Least Squares (PLS), could produce results of comparable accuracy to alternatve non-lnear regresson technques such as neural netorks. Hoever, PLS s merely one of a famly of lnear multvarate statstcal modellng

technques. It s Contnuum Regresson (CR) that provdes a unfed frameork encompassng all of these technques. Wse and Rcker (997) demonstrated the effectveness of CR, developng models of mproved accuracy hen compared to alternatve approaches (thout sacrfcng model robustness and generalsaton). Hoever, standard CR algorthms produce lnear models. In ths paper, e combne GP and CR to produce nonlnear CR algorthms. he objectve beng the development of an algorthm that can develop accurate and robust non-lnear models. Ho s ths paper organsed? he next secton provdes some background nformaton on lnear regresson. CR s then ntroduced and the range of technques t encompasses are explaned. Next, the modfcatons requred to extend the algorthm to a nonlnear strategy are outlned. o approaches are proposed, a sequental and a team-based algorthm. Havng detaled the algorthms, ther performance on a benchmark example are studed: he development of an nferental model of a food extruson process. After dscusson of the results, conclusons are dran and recommendatons for further ork are made. 2 LINEAR REGRESSION Gven a set of nput and output measurements for a process, a typcal modellng objectve s to obtan a relatonshp, that can be used to explan the varaton n an (n x ) output vector, y, n response to changes n an (n x m) nput matrx, X. hs can be expressed mathematcally as follos: y = f(x) + e () Where the functon f(.) s chosen so as to mnmse the vector of predcton errors, e. If f(.) s lnear n the parameters, then the nput-output model reduces to: y = Xr + e (2) here r s a (m x ) vector of regresson coeffcents. A famly of lnear multvarate statstcal modellng algorthms have been developed for the purpose of estmatng r. hree of the more common and effectve of these are Multple Lnear Regresson, Prncpal Component Regresson and Partal Least Squares. What s Multple Lnear Regresson? Multple Lnear Regresson (MLR) uses one of the many varants of the Least Squares technque (such as Batch Least Squares ) to parameterse Equaton 2. A dra-back of ths method s that t fals to produce accurate and robust models f the nput data s correlated (the columns of nput data are lnearly related to each other). hs s the norm hen data s collected from chemcal process systems. What s Prncpal Component Regresson? PCR avods the problems assocated th the modellng of correlated nput data (e. sngular solutons or mprecse parameter estmatons) by transformng the nputs nto a ne set of uncorrelated data (typcally of reduced dmensonalty), and then performng MLR beteen ths transformed data set and the output data. he procedure s based on a technque knon as Prncpal Component Analyss (PCA), one of the oldest and best documented multvarate statstcal technques (Pearson, 90). PCA s performed on the nput data, X, and generates a set of transformed nput varables (knon as prncpal components) that are uncorrelated and ranked n term of sgnfcance. hs allos the least sgnfcant prncpal components (ones that only descrbe process nose) to be dscarded pror to modellng. More rgorously, PCA rotates X to produce an (n x m) scores matrx, and an (m x m) loadngs matrx, V (descrbng the requred rotaton) such that, =XV. he loadng matrx V contans the egenvectors of the nput data correlaton matrx, X X. he scores matrx,, contans orthogonal projectons of the nput varables, sorted (from left to rght) n terms of decreasng contrbuton to the varance n the nput data. Retanng the frst p prncpal components leads to a reduced (n x p) scores matrx p and an (m x p) loadngs matrx V p. o perform PCR, batch least squares s used to regress p aganst y, gvng the (p x ) regresson vector b, such that y= p b + e. he nput-output mappng s then gven as, y = XV p b + e (3) Unfortunately, as PCA only consders the varaton n the nput data, there s no guarantee that the varaton represented by the major prncpal components ll correspond to the varaton that best descrbes the relatonshp th the output varable. What s Partal Least Squares? Partal Least Squares (PLS) as frst descrbed n Wold (966). As th PCR, the PLS algorthm overcomes problems assocated th measurement nose and correlaton beteen nput varables. Hoever, an addtonal advantage of PLS (as compared to PCR), s that by consderng both the nput varance and the nputoutput covarance, the algorthm provdes the opportunty for more accurate model development. A good ntroducton to the technque can be found n Gelad and Koalsk (986). PLS s commonly mplemented usng the NIPALS (Nonlnear Iteratve Partal Least Squares) algorthm. NIPALS sequentally extracts pars of latent vectors (analogous to the columns of the scores matrx n PCA) from the nput and output data. A unvarate (sngle nput, sngle output) regresson s performed at each stage, to model the relatonshp beteen these latent vector pars. he fnal model s then constructed by summng the contrbuton from each step. More rgorously, PLS (startng th =, X =X and, y =y) sequentally extracts latent vector pars, t and u, from X an y, n order of decreasng predctve poer. For the sngle output case, the vector u s equal to y,, hle the vector t corresponds to the projecton of X, n the

drecton most correlated to u (.e. t =X. here are the optmal nput projecton eghts).. he vectors t and u, are then regressed to obtan a unvarate lnear model th a regresson parameter b (such that u = b t + e ). As each latent vector s calculated the nformaton used s deducted from the nput-output data and these resduals (X + and y + ) are used to calculate the next set of latent vectors. hs procedure s repeated untl addtonal latent vector pars fal to mprove the model performance. he fnal model s then constructed by summng the contrbutons from each of the sgnfcant latent vector pars, y ˆ = b X (4) = here ŷ s the estmated output y. Substtutng the expresson for calculatng the nput resduals (X + = X - X ) nto Equaton 4 and recallng that X =X gves, yˆ = Xb + X ( I ) b + K + X ( I 2 K ( ) 2 ( ) Usng the fact that the are orthogonal (and thus j =0, for j), ths reduces to a form that s equvalent to Equaton 2, N ) b LV y = X b + e = Usng PLS, dffcultes assocated th modellng correlated nput varables are avoded by calculatng latent varables sequentally and only performng a unvarate regresson at each stage. Problems assocated th measurement nose are also overcome by projectng the nputs and outputs onto a loer dmensonal space. In addton, consderng both the nput varance and the nput-output covarance leads to mproved accuracy. Ho are the methods related? (5) (6) model For a lnear nput-output mappng, f all prncpal components or latent varables are retaned hen performng PCR or PLS, the to methods are equvalent to MLR. It s prmarly the dfference n emphass placed on varaton n the nput data compared to covarance beteen the nputs and the output that dstngushes the varous technques. A mathematcal frameork for unfyng these technques ll be ntroduced n the next secton. 3 CONINUUM REGRESSION Stone and Brooks (990) and Wse and Rcker (993) have demonstrated that MLR, PCR and PLS can be unfed under a sngle frameork knon as Contnuum Regresson. At one extreme of the contnuum s PCR, at the other s MLR, hle PLS s n the mddle. A qualtatve nterpretaton of ths concept can be obtaned by consderng ths contnuum n terms of the emphass that s placed on the varance n the nput data, hen compared to the covarance beteen the nputs and the outputs. Wth MLR, only the nput-output covarance s consdered hen formulatng the model (n fact the nputs are regarded as ndependent).wth PCR, a model s developed beteen the output and the sgnfcant prncpal components. hese components represent the most mportant varatons n the nput data, but ther covarance th the output s not necessarly maxmsed. Whle th PLS, a compromse beteen MLR and PCR s obtaned by consderaton of both the correlaton n the nput data and nput-output covarance. CR allos a smooth transton beteen these technques. What s the CR algorthm? he CR algorthm proposed by Wse and Rcker (993), frst creates a transformed data set by applyng a sophstcated scalng technque to the nput data. hs effectvely arps the nput space to place ether more or less emphass on exstng correlatons (dependng on a factor to be referred to as the contnuum coeffcent ). PLS (usng NIPLAS, as outlned n Secton 2) s then performed on the transformed data set. he frst step n transformng the nput data s to decompose X usng the sngular value decomposton (SVD) as, X = UΣV (7) here U s an (n x n) matrx of the egenvectors of XX and V s an (m x m) matrx of the egenvectors of the data correlaton matrx, X X. he dagonal elements of the (n x m) matrx Σ are the postve square roots of the egenvalues of X X and are called the sngular values (all other elements of Σ are zero). A modfed X matrx, (X µ ) s then formed as, X µ = UΣ µ V (8) here the matrx Σ µ s the sngular values rased to the poer µ (0 µ ). Where µ ll be referred to as the contnuum coeffcent. he standard NIPALS algorthm s then appled to X µ and y, gvng a fnal model of the follong form, y ˆ = N LV b = X In a smlar manner to Equaton 4, ths can be converted nto a form that s equvalent to Equaton 2 as follos, µ LV (µ ) y = XVΣ V b + e N = (9) (0) What s the qualtatve nterpretaton of the contnuum coeffcent? µ 0 CR tends toards MLR. µ = CR s equvalent to PLS. µ CR tends toards PCR.

4 NON-LINEAR CR USING GP o approaches for combnng GP and CR to create a non-lnear CR algorthm are proposed. he frst evolves nner models sequentally, hle the second evolves a team of nner models smultaneously. What s the sequental algorthm? he sequental non-lnear CR algorthm performs a GP run each tme a non-lnear nner model s requred (e. at each teraton of the latent varable loop). he mathematcal detals of the algorthm can be found n Appendx. At Step 8, a standard GP algorthm mnmses the cost functon J, by judcus choce of nonlnear functon, f (.). he optmal contnuum coefcent (µ) s obtaned by an outer loop that performs a undmensonal search usng Brent s method (Press et al., 992). What s the team-based algorthm? Rather than perform a GP run each tme a non-lnear nner model s requred, t s possble to evolve a team of nner models smultaneously usng a mult-populaton GP algorthm. Each team s composed of N p ndvduals (each an nner model), dran from N p separate populatons {P,, P Np }. eam members are assgned specfc tasks, based on the populaton from hch they are dran (eg. Members of populaton P are assgned the task of developng an nner model beteen the frst latent vector par, hle populaton P j contans canddates for the j th nner model). In ths manner, a team of non-lnear functons {f (.),,f Np (.)}.are evolved to mnmse the overall model predcton error (e n Equaton ). Ftness s calculated on a team bass. he ftness of a team s determned by mplementng the CR algorthm gven n Appendx (usng the nner models defned by the gven team). For the th latent varable par, at Step 8, the team Sequental eam-based Functon Set +, -, /, *, protected ^, protected, exp(), protected ln(), tanh() ermnal Set t,constants {t, t m,},constants Output: A sngle nner model for the A team of N P nner models. th latent varable par. Populaton Sze: 40 500 Ftness functon Lnear rankng Crossover Probablty 0.7 Mutaton Probablty: 0.2 Drect Reproducton 0. Probablty Proporton of elte 0. ndvduals retaned: ermnaton crteron 40 Gens. 60 Gens. Parameter optmsaton Yes No otal # of runs 20 Average # ndvduals processed per run: 5.0x0 5 2.5x0 5 able : Confguraton of the GP Algorthms member correspondng to the th nner model s evaluated. Each ndvdual n a team s then assgned the teams ftness. Each populaton s homogenous (e. only ndvduals from the same populatons can be crossed over). he crossover strategy employed does not mantan the ntegrty of teams (e. t s non-cohesve). he varous populatons are treated ndependently hen selectng ndvduals for crossover (rather than crossng over complete teams). Pror to ftness evaluaton, a teams contnuum coeffcent (µ) as determned n one of to ays. Ether, µ as optmsed usng Brent s method (th a probablty of 0.), or a random Gaussan perturbaton as made to the value of µ assocated th the fttest team of the prevous generaton. hs perturbed value of µ as then used to evaluate the teams ftness. 5 CONFIGURAION OF HE GP ALGORIHMS able shos the values of the most mportant GP algorthm control parameters, as ell as a summary of the termnal and functon sets used. It may be noted that the team-based algorthm used a sgnfcantly larger populaton than the sequental. Hoever, as the sequental algorthm as employng a parameter optmser (Levenberg-Marquardt non-lnear least squares), the sequental algorthm actually processed tce as many ndvduals per run. 6 EXPERIMENAL MEHOD For all of the results presented, the Root Mean Square () error beteen the actual and predcted output as calculated based on an unseen (or test) data set. hs test set s used to verfy the generalsaton of the model. Due to the stochastc nature of the GP algorthm, usng t on anythng but the smplest of systems produces dfferent results for every run. herefore, n order to nvestgate the performance of a GP-based algorthm for model buldng multple runs ere performed for each technque. 7 MODELLING A FOOD EXRUSION PROCESS As descrbed n Elsey et al (997), the parameters of ths smulated process have been ftted to plant data obtaned from a plot-scale APV Baker MPF 40 cookng extruder, processng a mxture of corn flour and ater. Gven the scre speed, feed rate, feed mosture content, feed temperature and barrel temperature profle, the model calculates the degree of gelatnsaton of the product. hs as selected as the process output of nterest n these studes. herefore, the termnal set for ths system conssts of four varables, scre speed, feed rate, feed mosture content and feed temperature. Four hundred steady-state readngs ere avalable for modellng. hs data as splt evenly th the frst 200 ponts beng used

to tran the models and the remanng 200 ponts reserved for testng. hs data set has become our benchmark example for comparng non-lnear modellng technques. A selecton of pertnent results are presented n able 2. hese results demonstrate that standard GP performs poorly hen compared to a feedforard neural netork (FANN). hs led to the development of to GP-based non-lnear PLS algorthms. he frst, GP_PLS (the so called quck and drty approach), s equvalent to the sequental algorthm presented n ths paper, hen the contnuum coeffcent s set to one. Whle the second, GP_PLS2, demonstrated that optmsaton of the nput projecton eghts ( ) can be benefcal n reducng the model predcton errors. hs s because, n lnear PLS the s calculated to acheve maxmum lnear covarance beteen t and u even though the nner models are n fact non-lnear. herefore, for a gven non-lnear functon, may not correspond to the drecton that produces the best approxmaton of u. Hden et al. (998) demonstrated that GP_PLS2 could produce results of comparable accuracy to a FANN. Hoever, t as concluded that optmsng as an unacceptable computatonal burden (especally th ncreasng numbers of nput varables), lmtng the applcaton of the algorthm. As such, n ths ork no attempt as made to optmse the nput projecton eghts. Fgure shos a comparson of the error dstrbutons obtaned usng the sequental and teambased algorthms (hle able 3 summarses the average Frequency 0 5 Sequental eam-based 0 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 Error on est Set Fgure : Model error dstrbutons on the cookng extruder data. Method Mean est Mn. est Source FANN 0.03 0.024 McKay (997) Standard GP 0.280 0.88 Hden(998) PLS - Lnear GP_PLS GP_PLS2 0.925 0.049 0.022 0.925 0.044 0.07 Hden et al. (998) able 2: Summary of pror modellng ork on the cookng extruder data. Method Mean est Mn. est µ Best Model Best Model Sequental 0.0354 0.036.09 3 eam-based 0.0252 0.0243 0.6 3 able 3: Results for the sequental and team-based algorthms on the cookng extruder data. and best errors). It may be noted that the sequental algorthm outperforms GP_PLS. hs s because the optmsaton of µ has a smlar effect to optmsng the nput eghts (n that t changes the projecton of the nput data n an attempt to maxmse the model ft). Hoever, as there s only one contnuum coeffcent (as compared to hch s an (m x ) vector, here m s the number of nput varables) the optmsaton problem becomes more tractable. It s also apparent that the team-based approach produces sgnfcantly loer errors on the test set, compared to the sequental algorthm. hs s despte the fact that the sequental algorthm processed more ndvduals (as shon n able ). he sequental algorthm fnds the best ft for each latent varable par n turn. hs procedure effectvely constrans the type of solutons the algorthm ll fnd. Whle resultng n good local solutons at each step, t s conjectured that t may be prone to beng trapped n local mnma. In comparson, the team-based algorthm optmses all of the nner models smultaneously. As Ftness s calculated on a team-bass, ths allos a tradeoff beteen team member contrbutons to the overall ftness measure. hs provdes the potental for escape from local mnma and the possblty of obtanng a more global optmum. he best set of nner models obtaned usng the team-based algorthm are gven by Equatons, 2, and 3, u = 2.6 exp(tanh(tanh(exp(4t )))) - 4.58 () u 2 = 0.0498 ( tanh(t 2 ) t2 + (2 t 2-0.0474)/exp(t 2 ) (2) + u - 0.0503 t 2 2 ) 0.096 u 3 = 0.80 (t 3 0.038) tanh(2t 3 ) 0.0352 (3) 8 CONCLUSIONS he food extruson benchmark example shos that nonlnear CR usng GP gves superor performance to a smlar mplementaton of non-lnear PLS. More specfcally, the sequental non-lnear CR algorthm outperforms GP_PLS of Hden et al.(998). It s conjectured that ths s because the optmsaton of µ has a smlar effect to optmsng the nput projecton eghts. Both change the projecton of the nput data n an attempt to maxmse the model ft. Hoever, optmsng the contnuum coeffcent s less computatonally ntensve than optmsng (hch s an (m x ) vector, here m s the number of nput varables). he case study also demonstrates the effectveness of the team-based approach to non-lnear CR, as compared to the sequental algorthm. By fndng the best ft for each latent varable par n turn, the sequental technque effectvely constrans the type of solutons the algorthm ll fnd. Whle concentratng on achevng good nner models at each step, t may be prone to beng trapped n local mnma. By optmsng all of the nner models smultaneously, the team-based algorthm allos tradeoffs beteen ndvduals contrbutons to the overall team

ftness. hs may allo the evoluton of a more globally optmal soluton. As explaned n Secton 4, the team-based algorthm employs a non-cohesve reproducton strategy. hs means that teams are broken up durng reproducton (e the ntegrty of the team s not mantaned). If only one type of team strategy s gong to ork (e. f there s only one good ay of performng each task), then dsruptng the ntegrty of the teams ll probably not have a detrmental effect. Whle e beleve that the non-lnear CR modellng task s an example of such a system (here the form of each nner model, and hence the task of each team member, s largely dctated by the data set), future ork ll examne the effect of mplementng a cohesve crossover strategy. As noted n Secton 7, the team-based non-lnear CR algorthm does not outperform GP_PLS2, the eght optmsed non-lnear PLS algorthm of Hden et al., (998). Hoever, t as consdered that optmsng the projecton eghts as currently an unacceptable computatonal burden. As such, fndng an mproved method of mplementng the nput eght optmsaton ll be a focus of future ork. 9 REFERENCES Elsey, J. Repenhausen, J. McKay, B., Barton, G.W. and Wlls, M.J., (997), Modellng and Control of a Food Extruson Process, Computers and Chemcal Engneerng, Vol. S2, pp S36-S366. Gelad, P. and Koalsk, B.R. (986), Partal least squares regresson: A tutoral, Analytca Chmca Acta, Vol.85, pp.-7. Hden, H.G. (998), Data-Based Modellng usng Genetc Programmng, PhD hess, Dept. Chemcal and Process Engneerng, Unversty of Necastle, UK. Hden, H.G., McKay, B., Wlls, M.J. and Montague, G.A. (998) Non-lnear partal least squares usng genetc programmng, Proc. 3 rd Annual Conf. On Genetc Programmng, Unversty of Wsconsn, Maddson, USA. July 22-25 th., pp.28-33. McKay, B., (997), Studes n Data-Based Modellng, PhD hess, Dept.Chemcal Engneerng, he Unversty of Sydney, Australa. McKay, B., Wlls, M.J. and Barton, G.W., (997), Steady-state modellng of chemcal process systems usng genetc programmng, Computers and Chemcal Engneerng, Vol.2, No.9, pp.98-996. Pearson, K., (90), On lnes and planes of closest ft to systems of ponts n space, Phl. Mag., Ser.6, 2(), pp.559-572. Press, W.H., Flannery, B.P., eukolsky, S.A and Vetterlng, W.., (992), Numercal Recpes n C: he Art of Scentfc Computng, 2nd Ed, Cambrdge Unversty Press, USA. Stone and Brooks (990), Contnuum regresson: Crossvaldated sequentally constructed predcton embracng ordnary least squares, partal least squares, and prncpal components regresson, J.R.Statst. Soc., B,52, pp.337-369. Wse, B.M. and Rcker, N.L. (993) Identfcaton of fnte mpulse response models th contnuum regresson, Journal of Chemometrcs, Vol. 7, -4. Wold, S., (966), Non-lnear estmaton by teratve least squares procedures, Research Papers n Statstcs, Ed. Davd, F., Wley, Ne York, USA. APPENDIX : Non-Lnear Contnuum Regresson Algorthm Sequental Algorthm: he algorthm n able A represents a sngle functon evaluaton for a gven value of µ. hs s called repeatedly by Brent s method to optmse µ. Note that a full GP run s performed at Step 8 to optmse the ft of each nner model. eam-based Algorthm: he algorthm n able As called by a mult-populaton GP algorthm as a sngle ftness evaluaton, for a gven team of nner models, and a gven µ. () Mean centre and normalse the nput and output data (X and y) (2) Perform SVD on scaled nput data (X) UΣV = X (n x n)(n x m)(m x m) (3) Generate modfed nput data (X µ ) X µ = UΣ µ V (n x m) matrx (4) Set the output latent vector (u ) equal to the output. u = y (n x ) column vector (5) Calculate eghts ( ) that correspond to the drecton n = ( X ) u (m x ) column vector X th the greatest covarance th u. (6) Normalse to unt length = (m x ) column vector (Note. refers to the Eucldean Norm) (7) Calculate the latent vector t hch s the projecton of X onto t = X (n x ) column vector the drecton defned by. (8) Sequental: mnj eam-based: Use suppled f ( f (. ))( f (. )) (.) J = u t u t f(.) (9) Calculate resdual matrces X = X t (n x m) matrx + (n x ) column vector (0) If <, {Return to Step 5; = +} () Construct the fnal model y = y f ( t ) + yˆ = = f ( X ) able A: he non-lnear contnuum regresson algorthm. (m x ) column vector