Data mining based automated reverse engineering and defect discovery

Similar documents
Decision Support for Rule and Technique Discovery in an Uncertain Environment

IP Network Design by Modified Branch Exchange Method

Segmentation of Casting Defects in X-Ray Images Based on Fractal Dimension

RANDOM IRREGULAR BLOCK-HIERARCHICAL NETWORKS: ALGORITHMS FOR COMPUTATION OF MAIN PROPERTIES

Controlled Information Maximization for SOM Knowledge Induced Learning

Fuzzy Logic Resource Management and Coevolutionary Game-based Optimization

An Unsupervised Segmentation Framework For Texture Image Queries

A modal estimation based multitype sensor placement method

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012

Obstacle Avoidance of Autonomous Mobile Robot using Stereo Vision Sensor

Communication vs Distributed Computation: an alternative trade-off curve

Point-Biserial Correlation Analysis of Fuzzy Attributes

A VECTOR PERTURBATION APPROACH TO THE GENERALIZED AIRCRAFT SPARE PARTS GROUPING PROBLEM

Positioning of a robot based on binocular vision for hand / foot fusion Long Han

ADDING REALISM TO SOURCE CHARACTERIZATION USING A GENETIC ALGORITHM

Detection and Recognition of Alert Traffic Signs

Color Correction Using 3D Multiview Geometry

Illumination methods for optical wear detection

Optical Flow for Large Motion Using Gradient Technique

HISTOGRAMS are an important statistic reflecting the

Conversion Functions for Symmetric Key Ciphers

Performance Optimization in Structured Wireless Sensor Networks

Clustering Interval-valued Data Using an Overlapped Interval Divergence

Input Layer f = 2 f = 0 f = f = 3 1,16 1,1 1,2 1,3 2, ,2 3,3 3,16. f = 1. f = Output Layer

Lecture # 04. Image Enhancement in Spatial Domain

Spiral Recognition Methodology and Its Application for Recognition of Chinese Bank Checks

A Novel Automatic White Balance Method For Digital Still Cameras

ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS

Topological Characteristic of Wireless Network

Slotted Random Access Protocol with Dynamic Transmission Probability Control in CDMA System

Conservation Law of Centrifugal Force and Mechanism of Energy Transfer Caused in Turbomachinery

A Shape-preserving Affine Takagi-Sugeno Model Based on a Piecewise Constant Nonuniform Fuzzification Transform

The EigenRumor Algorithm for Ranking Blogs

Module 6 STILL IMAGE COMPRESSION STANDARDS

Towards Adaptive Information Merging Using Selected XML Fragments

A Recommender System for Online Personalization in the WUM Applications

Assessment of Track Sequence Optimization based on Recorded Field Operations

FINITE ELEMENT MODEL UPDATING OF AN EXPERIMENTAL VEHICLE MODEL USING MEASURED MODAL CHARACTERISTICS

XFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers

Fifth Wheel Modelling and Testing

IP Multicast Simulation in OPNET

Automatically Testing Interacting Software Components

Transmission Lines Modeling Based on Vector Fitting Algorithm and RLC Active/Passive Filter Design

Image Compression using Genetic Algorithm

On the Forwarding Area of Contention-Based Geographic Forwarding for Ad Hoc and Sensor Networks

Embeddings into Crossed Cubes

Frequency Domain Approach for Face Recognition Using Optical Vanderlugt Filters

On Error Estimation in Runge-Kutta Methods

Annales UMCS Informatica AI 2 (2004) UMCS

Simulation and Performance Evaluation of Network on Chip Architectures and Algorithms using CINSIM

And Ph.D. Candidate of Computer Science, University of Putra Malaysia 2 Faculty of Computer Science and Information Technology,

Erasure-Coding Based Routing for Opportunistic Networks

Bo Gu and Xiaoyan Hong*

Prioritized Traffic Recovery over GMPLS Networks

Extract Object Boundaries in Noisy Images using Level Set. Final Report

Effects of Model Complexity on Generalization Performance of Convolutional Neural Networks

A Memory Efficient Array Architecture for Real-Time Motion Estimation

DEADLOCK AVOIDANCE IN BATCH PROCESSES. M. Tittus K. Åkesson

Combinatorial Mobile IP: A New Efficient Mobility Management Using Minimized Paging and Local Registration in Mobile IP Environments

Accurate Diffraction Efficiency Control for Multiplexed Volume Holographic Gratings. Xuliang Han, Gicherl Kim, and Ray T. Chen

The Internet Ecosystem and Evolution

Desired Attitude Angles Design Based on Optimization for Side Window Detection of Kinetic Interceptor *

Information Retrieval. CS630 Representing and Accessing Digital Information. IR Basics. User Task. Basic IR Processes

Improvement of First-order Takagi-Sugeno Models Using Local Uniform B-splines 1

Parallel processing model for XML parsing

Generalized Grey Target Decision Method Based on Decision Makers Indifference Attribute Value Preferences

Number of Paths and Neighbours Effect on Multipath Routing in Mobile Ad Hoc Networks

Multi-azimuth Prestack Time Migration for General Anisotropic, Weakly Heterogeneous Media - Field Data Examples

Adaptation of Motion Capture Data of Human Arms to a Humanoid Robot Using Optimization

SYSTEM LEVEL REUSE METRICS FOR OBJECT ORIENTED SOFTWARE : AN ALTERNATIVE APPROACH

A Neural Network Model for Storing and Retrieving 2D Images of Rotated 3D Object Using Principal Components

New Algorithms for Daylight Harvesting in a Private Office

THE THETA BLOCKCHAIN

Comparisons of Transient Analytical Methods for Determining Hydraulic Conductivity Using Disc Permeameters

An Optimised Density Based Clustering Algorithm

Scaling Location-based Services with Dynamically Composed Location Index

A New and Efficient 2D Collision Detection Method Based on Contact Theory Xiaolong CHENG, Jun XIAO a, Ying WANG, Qinghai MIAO, Jian XUE

An Extension to the Local Binary Patterns for Image Retrieval

A ROI Focusing Mechanism for Digital Cameras

Lecture 27: Voronoi Diagrams

Image Enhancement in the Spatial Domain. Spatial Domain

AUTOMATED LOCATION OF ICE REGIONS IN RADARSAT SAR IMAGERY

SANTA FE TRAIL FOR ARTIFICIAL ANT BY MEANS OF ANALYTIC PROGRAMMING AND EVOLUTIONARY COMPUTATION

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma

Hierarchically Clustered P2P Streaming System

Reachable State Spaces of Distributed Deadlock Avoidance Protocols

Hierarchical Region Mean-Based Image Segmentation

ART GALLERIES WITH INTERIOR WALLS. March 1998

An Energy-Efficient Approach for Provenance Transmission in Wireless Sensor Networks

CLUSTERED BASED TAKAGI-SUGENO NEURO-FUZZY MODELING OF A MULTIVARIABLE NONLINEAR DYNAMIC SYSTEM

Reader & ReaderT Monad (11A) Young Won Lim 8/20/18

UCB CS61C : Machine Structures

A Minutiae-based Fingerprint Matching Algorithm Using Phase Correlation

EYE DIRECTION BY STEREO IMAGE PROCESSING USING CORNEAL REFLECTION ON AN IRIS

POMDP: Introduction to Partially Observable Markov Decision Processes Hossein Kamalzadeh, Michael Hahsler

10/29/2010. Rendering techniques. Global Illumination. Local Illumination methods. Today : Global Illumination Modules and Methods

An Improved Resource Reservation Protocol

Title. Author(s)NOMURA, K.; MOROOKA, S. Issue Date Doc URL. Type. Note. File Information

UCLA Papers. Title. Permalink. Authors. Publication Date. Localized Edge Detection in Sensor Fields.

Topic -3 Image Enhancement

Transcription:

Data mining based automated evese engineeing and defect discovey James F. Smith III, ThanhVu H. Nguyen Naval Reseach Laboatoy, Code 5741, Washington, D.C., 20375-5000 ABSTRACT A data mining based pocedue fo automated evese engineeing and defect discovey has been developed. The data mining algoithm fo evese engineeing uses a genetic pogam (GP) as a data mining function. A GP is an evolutionay algoithm that automatically evolves populations of compute pogams o mathematical expessions, eventually selecting one that is optimal in the sense it maximizes a fitness function. The system to be evese engineeed is typically a senso that may not be disassembled and fo which thee ae no design documents. The senso is used to ceate a database of input signals and output measuements. Rules about the likely design popeties of the senso ae collected fom expets. The ules ae used to ceate a fitness function fo the GP allowing GP based data mining. This pocedue incopoates not only the expets ules into the fitness function, but also the infomation in the database. The infomation extacted though this pocess is the intenal design specifications of the senso. These design popeties can be used to ceate a fitness function fo a genetic algoithm, which is in tun used to seach fo defects in the digital logic design. Significant theoetical and expeimental esults ae povided. Keywods: data mining, knowledge discovey, genetic pogams, genetic algoithms, evese engineeing, defect discovey 1. INTRODUCTION An enginee has a boowed system and must detemine what design flaws may be pesent. The owne of the system will not allow it to be disassembled, and the design specifications fo the system ae not available. The enginee howeve has access to good infomation about the peiod in which the system was designed and constucted, the cost of the system and as such the most likely pats the system used. The enginee also knows that the oiginal enginees who designed the system wee subect to significant cost constaints, so the design is vey efficient. Finally the enginee has access to a histoical database of input and output data associated with the system. Couldn t the enginee appoach the system in a tial an eo fashion and eventually find the flaw? It might be possible to do this, but if the system is complex the enginee might not live long enough to find the poblem. Finally, expeimental time can be extemely costly, if the enginee can pedetemine the most likely input pio to actual expeimentation in the lab, then the savings can be enomous. So it would be useful if an automated pocedue wee available fo discoveing design flaws. To solve this poblem, a data mining 1 based pocedue fo automated evese engineeing and defect discovey has been developed. The data mining algoithm fo evese engineeing uses a genetic pogam (GP) as a data mining function. A genetic pogam is an algoithm based on the theoy of evolution that automatically evolves populations of compute pogams o mathematical expessions, eventually selecting one that is optimal in the sense it maximizes a measue of effectiveness, efeed to as a fitness function 2-5. The system to be evese engineeed is typically a senso. The senso is used to ceate a database of input signals and output measuements. Rules about the likely design popeties of the senso ae collected fom expets. The ules ae used to ceate a fitness function fo the genetic pogam. Genetic pogam based data mining is then conducted 3-5. This pocedue incopoates not only the expet s ules into the fitness function, but also the infomation in the database. The infomation extacted though this pocess is the intenal design specifications of the senso. The design popeties extacted though this pocess can be used to Coespondence: Email: fsmith@dsews.nl.navy.mil 232 Data Mining, Intusion Detection, Infomation Assuance, and Data Netwoks Secuity 2005, edited by Belu V. Dasaathy, Poceedings of SPIE Vol. 5812 (SPIE, Bellingham, WA, 2005) 0277-786X/05/$15 doi: 10.1117/12.602077

ceate a fitness function fo a genetic algoithm 6-15 (GA), which is in tun used to seach fo defects in the senso design. Unlike a GP, a GA manipulates stings of numbes, not compute pogams, with the intent of poducing optimal esults by maximizing a fitness function. Significant theoetical and expeimental esults ae povided. Section 2 intoduces the ideas of genetic algoithms and genetic pogams. Section 3 discusses data mining and the use of a genetic pogam as a data mining function. Section 4 examines the digital logic design to be evese engineeed using genetic pogam based data mining. Section 5 explains the genetic pogam s teminal set, function set, fitness function, stopping citeia, paametes, and data mining esults. Section 6 discusses how a genetic algoithm can be used to automatically discove defects in the design of digital logic. Section 7 gives an example of a defect discoveed by the genetic algoithm in the design of the digital logic evese engineeed by the genetic pogam. Finally, section 8 povides conclusions. 2. GENETIC ALGORITHMS AND GENETIC PROGRAMS The following subsections descibe genetic algoithms and genetic pogams. Diffeences between the algoithms stuctue, applications as well as some chaacteistics specific to the GP used fo data mining ae discussed. 2.1 Genetic algoithm A genetic algoithm is an optimization method that manipulates a sting of numbes in a manne simila to how chomosomes ae changed in biological evolution 6-15. An initial population made up of stings of numbes is chosen at andom o is specified by the use. Each sting of numbes is called a chomosome and each numbeed slot is called a gene. A set of chomosomes foms a population. Each chomosome epesents a given numbe of taits that ae the actual paametes that ae being vaied to optimize the fitness function. The fitness function is a pefomance index that is to be maximized. The opeation of the genetic algoithm poceeds in steps. Beginning with the initial population, selection is used to choose which chomosomes should suvive to fom a mating pool. Chomosomes ae chosen based on how fit they ae (as computed by the fitness function) elative to the othe membes of the population. Moe fit individuals end up with moe copies of themselves in the mating pool so that they will moe significantly effect the fomation of the next geneation. Next, two opeations ae taken on the mating pool. Fist, cossove (which epesents mating, the exchange of genetic mateial) occus between paents. In cossove, a andom spot is picked in the chomosome, and the genes afte this spot ae switched with the coesponding genes of the othe paent. Following this, mutation occus. Mutation is a change of the value of a andomly selected gene. Afte the cossove and mutation opeations occu, the esulting stings fom the next geneation and the pocess is epeated. Anothe pocess known as elitism is also employed. Elitism consists of copying a cetain numbe of fittest individuals into the next geneation to make sue they ae not lost fom the population. Finally, a temination citeion is used to specify when the algoithm should stop, e.g., when a peset maximum numbe of geneations has been eached, the fitness has gained its maximum possible value o that the fitness has not changed significantly in a cetain numbe of geneations. 2.2 Genetic pogam A genetic pogam 2-5 is a poblem independent method fo automatically ceating gaphs that epesent compute pogams, mathematical expessions, digital cicuits, etc. Like a genetic algoithm it evolves a solution using Dawin s pinciple of suvival of the fittest. Unlike the genetic algoithm, of which it can be consideed an extension: its initial, intemediate, and final populations ae compute pogams. Like a genetic algoithm, the individuals that make up the population ae subect to selection, cossove, mutation and elitism. The cossove and mutation opeations ae constained to poduce stuctually valid offsping, i.e., functional compute pogams, digital cicuits, etc. Two additional GP opeations not used with GAs ae achitectue alteing steps 2 (AAS) and symmetical eplication (SR). Poc. of SPIE Vol. 5812 233

AAS is a method intoduced to impove the GP s convegence. This function andomly changes genes in elite chomosomes, i.e., typically the 50 chomosomes with the highest fitness. Fo the digital logic diagam of Figue 1, AAS might consist of changing H 1 to H 3 o SUM_sig123 to SUM_sig2. This notation is explained in geate detail in sections 4 and 5. SR efes to eplacing a banch entiely with anothe banch that aleady existed on the chomosome. SR is simila to mutation, it is not used often and its pupose is to beak out of the local fitness landscape, if possible. Like mutation, this eplication happens afte a specific numbe of geneations that have the same maximum fitness. SR is designed to help the GP escape the neighbohood of what may be a local maximum so that the GP can locate the global maximum. Finally, it is geneally applied to only a small andom set of the elite chomosomes, this allows the GP to bette seach the space while maintaining divesity in the population. GPs equie a teminal set and function set as inputs. The teminals ae the actual vaiables of the poblem. These can include a vaiable like x used as a symbol in building a polynomial and also eal constants. The function set consists of a list of functions that can opeate on the vaiables. Fo example, in a pevious application of a GP as a data mining function to evolve fuzzy decision tees symbolically 3-5, the teminal set consists of fuzzy membeship functions and the function set consists of logical connectives like AND, OR and logical modifies like NOT. Since fuzzy logic allows moe than one mathematical epesentation of AND and OR, the function set could be complicated. When the GP is used as a data mining function, a database of input and output infomation is equied. In the case of fuzzy decision tees the database epesented a collection of scenaios about which the fuzzy decision tee to be evolved would make decisions. The database also had enties ceated by expets epesenting decisions about the scenaios. The optimal fuzzy decision tee would be the one that could most closely epoduce the expets decisions about the scenaios. When the GP is used as a data mining function fo evolving digital logic (DL), the database contains inputs to the DL as well as measued outputs. The expets opinions ae manifested in the selection of the input and associated output to be included in the database. Fo the DL case an additional fom of input consisting of ules about DL constuction ae included. Finally, the teminal set, function set, database to be data mined, and ules fo DL constuction ae descibed in geate detail in sections 4 and 5. 3. EVOLUTIONARY ALGORITHM BASED DATA MINING Data mining is the efficient extaction of valuable non-obvious infomation embedded in a lage quantity of data 1. Data mining consists of thee steps: the constuction of a database that epesents tuth; the calling of the data mining function to extact the valuable infomation, e.g., a clusteing algoithm, neual net, genetic algoithm, genetic pogam, etc; and finally detemining the value of the infomation extacted in the second step, this geneally involves visualization. When used fo evese engineeing, the GP, typically data mines a database to detemine a gaph-theoetic stuctue, e.g., a system s DL diagam o an algoithms flow chat o decision tee 3-5. The GP mines the infomation fom a database consisting of input and output values, e.g., a set of inputs to a senso and its measued outputs. GP based data mining will be applied to the constuction of the DL descibed in section 4. To use the genetic pogam it is necessay to constuct teminal and function sets elevant to the poblem. Befoe the specific teminal and function sets fo the evese engineeing poblem ae descibed, a moe detailed desciption of the digital logic to be consideed will b e given in the section 4. 4. THE DIGITAL LOGIC TO BE REVERSE ENGINEERED The DL to be evese engineeed is depicted in Figue 1. This DL is not known to the GP. The GP only has access to a database of input signals to the DL and measued output, as well as, a database of ules povided by expets fo building the DL. In Figue 1 the DL consists of thee input channels each with a senso attached. The sensos eceive signals fom souces one, two and thee. Only measuements fom souces two, the cental souce is of inteest. Due to the geomety of the souces and popeties of the sensos only senso two can eceive emissions fom souce two that ae 234 Poc. of SPIE Vol. 5812

(2) (4) significant. Unfotunately, senso two s measuement may be coupted by emissions fom souces one and thee. The digital logic is constucted so that if thee wee significant couption of senso two s measuements, then the final ORgate etuns unity, so the measuements can be ignoed. 4.1 Desciption of the DL and its elements In Figue 1 thee ae a numbe of DL elements depicted that ae used epeatedly. DL components and signals will ultimately become elements of the GP s teminal and function sets. The sensos in Figue 1 will eceive an analog signal and convet it to a digital fom, i.e., they will map eal-valued input to the set of integes. A sampling window of size N is used, i.e., the signal is sampled evey t seconds fo a total of N samples in that window. The sample is indicated by the vecto s in (1) with sampling beginning at time t o. The -subscipt implies the signal oiginate in the th souce, whee =1,2,3, s = [ s ( t ), s ( t + t),..., s ( t + ( N 1) t)]. (1) o o The DL function, SUM, given explicitly in (2), epesents the logaithmic sum of the absolute value of the time components of the digitized input that has been eceived fo a single window of length N. o SUM ( s ) N ln s k = = 1 ( t o + ( k 1) t) The elements labeled H i, fo i=1,2,3, ae Heaviside step functions as given in (3). If the input is geate than o equal to a theshold, τ i, fo i=1,2,3; then a value of unity is tansmitted, othewise a zeo is tansmitted. 1, Hi ( s) = 0, if if s τ i s < τ i (3) The DL function, MAX, given in (4), etuns the natual logaithm of the maximum absolute value of the time components of the input signal fo a single window of length N. The element labeled DIFF, takes the diffeence between input to its fist and second teminals as indicated in (5). N MAX ( s ) = ln s k= 1 ( t o + ( k 1) t) DIFF( I1, I2) = I1 I (5) 2 The DL function, Odelay3, takes only Boolean inputs, i.e., it expects zeo o one as an input. It waits until it has thee consecutive inputs fom thee consecutive time windows, hence the 3 in its name. Once it eceives thee consecutive inputs, it yields as an output the maximum of its inputs. Also not depicted, but used in the GP s function set ae Anddelay3, which takes thee inputs of zeo o one coesponding to thee consecutive time windows and yields as output the minimum of its inputs. Anddelay2 uses only two inputs and etuns thei maximum. Finally, the symbols labeled AND3, OR3, AND2, and OR2 ae the conventional logical connectives AND and OR, with the numeical designation indicating the numbe of inputs expected, e.g., AND3 expects thee Boolean inputs. The signals ae additive, at any given time senso two may ecod a supeposition of the thee souces tansmissions, which is epesented by s 1 (t)+ s 2 (t) + s 3 (t). If the thee sensos signals ae of sufficient magnitude then this is chaacteistic of couption and the final OR in Figue 1 etuns unity. Poc. of SPIE Vol. 5812 235

5. TERMINAL SET, FUNCTION SET, FITNESS FUNCTION, PARAMETERS, AND DATA MINING RESULTS This section descibes the teminal set, function set, data base constuction, fitness function, the GP s essential paametes and the esults of data mining fo the DL epesented by Figue 1. The desciption is given in tems of DL elements and popeties, but the genetic pogam based evese engineeing technique is vey geneal and can be applied to any system that can be descibed in a gaph theoetic language, e.g., decision pocesses descibed in tems of decision tees 3-5. 5.1 Teminal and function sets The teminal set consists of the following elements: T={SUM_sig123, MAX_sig123, SUM_sig1, MAX_sig1, SUM_sig3, MAX_sig3} (6) whee SUM_sig123 = SUM( s1 + s2 + s3 ) (7) MAX_sig123 = MAX( s1 + s2 + s3 ) (8) SUM_sig1 = SUM( s 1 ) (9) MAX_sig1 = MAX( s 1 ) (10) SUM_sig3 = SUM( s 3 ) (11) MAX_sig3 = MAX( s 3 ) (12) All senso measuements begin at time, t o. The function set consists of the following elements: F={AND3, OR3, AND2, OR2, ANDDELAY3, ORDELAY3, H1, H2, H3, DIFF} (13) The function ANDDELAY3 is not used in Figue 1. By including it, the GP s ability to disciminate against extaneous functions is emphasized. The GP s goal is to evolve a solution that epesents Figue 1. The GP s ability to do this will be detemined lagely by the fitness function and the undelying databases to be discussed. The chomosome to be evolved by the GP in pefix notation that epesents Figue 1 is given in (14). OR2 ORDELAY3 AND3 H 3 SUM_SIG3 H 2 DIFF SUM_SIG3 SUM_SIG123 H 1 MAX_SIG123 ORDELAY3 AND3 H 3 SUM_SIG1 H 2 DIFF SUM_SIG1 SUM_SIG123 H 1 MAX_SIG123 (14) 5.2 Ceating the database The database used fo constucting the fitness function consists of inputs to the system to be evese engineeed 236 Poc. of SPIE Vol. 5812

and measued output. Fo the digital logic diagam of Figue 1 the input is a collection of signals fom souces one though thee and the associated measued output. The output of the digital logic epesented by Figue 1 will consists entiely of zeos and ones. The fact that the system maps eal valued input into the doublet set {0,1} complicates the evese engineeing pocess. Fo a case like this the ule set used to constuct the fitness function is of paamount impotance if a unique solution is to be found. These ules coespond to conditions povided by human expets egading what elements the logic may have, how they ae inteconnected, and thei abundance. 5.3 Calculating the fitness based on ules and pasimony pessue This subsection discusses a few of the ules used to constuct the fitness function. Typically, each ule epesents a desiable featue that the final solution must have o might have. Towad this goal, candidate solutions, i.e., chomosomes within the GP s evolving population ae given a highe fitness fo showing chaacteistics consistent with the ules. 5.3.1 Rule examples and thei ewads Fo the digital logic depicted in Figue 1, a numbe of ules can be developed without knowing the specific design. Fo example, when actually using the DL o upon close analysis of input -output elations it is obseved that the logic waits fo a peiod coesponding to thee consecutive time windows befoe making a declaation. Based on the logic s function the following ules ae fomulated Rule 1: Thee must be a thee time window time delay built into the DL. Othe ules povided by expets ae Rule 2: The digital logic must have a delay nea the output level. Since data and expet intuition seem to indicate that thee is an AND o OR at the end, the ule follows Rule 3: The last opeation is a Boolean. It follows fom the popeties of the hadwae implementation of the Heaviside step function and the DIFF opeato that Rule 4: The Heaviside step function must take an intege input Rule 5: The DIFF opeato must have a two signal input Likewise, Rule 6 epesents commonsense Rule 6: Thee is no point in having a functional composition of Heaviside step functions. In each case when the GP detemines a candidate solution satisfies the ule, the fitness fo that solution is incemented by unity. Othe ules wee used, but the above set captues the spiit of how ules ae obtained and ewads ae awaded. So it is obseved some of the ules, like Rule 1 aise fom an obsevation, but othes like Rule 3 aise fom popeties of the logic elements o in the case of Rule 6 fom common sense. 5.3.2 Contolling bloat One of the fundamental poblems associated with genetic pogamming is contolling bloat. Bloat 16 efes to excessive gowth of candidate solutions. It is found empiically that evey 50 geneations the length of candidate solutions, i.e., chomosomes within a GP s evolving population inceases by a facto of thee. Vaious pocedues have Poc. of SPIE Vol. 5812 237

been invented fo contolling bloat. Two of the most popula ae the Koza depth limit 2 and pasimony pessue 3-5, 16. Anothe technique that involves the use of compute algeba to contol bloat when using a GP to evolve mathematical expessions has been invented ecently 3. This algebaic appoach was not used fo the study at hand but may be applied in a futue extension of this wok. The pocedue applied fo evolving the digital logic of Figue 1 was pasimony pessue. The basic fitness is the fitness detemined by the input-output and ule database. The pasimony pessue fo a given chomosome is calculated by multiplying the pasimony coefficient, α, by the length of the chomosome. The oveall fitness is detemined by subtacting the pasimony pessue fom the basic fitness. Thus given two candidate solutions with the same basic fitness, the shote of the two will have the highe oveall fitness. The pasimony pessue can be egaded as a numeical epesentation of Occam s azo 17. 5.4 Setting the GP s paametes The GP equies the specification of many paametes fo efficient opeation. Modification to these paametes can have a significant impact on the convegence time and solution found by the GP. Some of the most impotant paametes ae population size, database size, pobability of mutation, pobability of cossove, and pasimony pessue. The size of the population of evolving chomosomes is vey impotant, a lage population can epesent a significant degee of divesity, potentially esulting in the detemination of a bette solution. Inceased population size can also have a significant effect on un time. The default population size used fo the DL depicted in Figue 1 is 1000. Inceasing the size of the database can have a significant effect. A lage database may contibute to the detemination ultimately of a unique solution o a smalle final class of high fitness potential solutions. Unfotunately, given that the digital logic in Figue 1 maps fom eal-valued inputs to the doublet set {0,1}, the benefits of a lage database ae limited. It becomes impotant to embed high quality ules in the fitness function, ewading good candidate solutions, i.e., those consistent with the ules and penalizing poo quality chomosomes. The pobabilities of cossove and mutation can also be vey impotant. Low pobabilities of cossove and mutation can esult in only limited egions of the fitness landscape being exploed, i.e., the GP will get locked into a local neighbohood and not escape. Two othe paamete dependent opeations ae AAS and SR which ae applied on to the elite chomosomes, i.e., the top 50 with the highest fitness. When applying AAS, the GP selects about 63% of the elite chomosomes in each geneation. Likewise, SR is applied to seven pecent of the elite chomosomes that have been though 12 geneations whee the maximum fitness has not changed. 5.5 Data mining esults The GP successfully discoveed the DL of Figue 1 though data mining afte 60 geneations using the paametes given in the pevious subsection. The SR opeation poved to be vey effective and helped acceleate convegence significantly. 6. AUTOMATED DEFECT DISCOVERY USING A GENETIC ALGORITHM The second phase of this wok, i.e., automatic defect detection involves use of a genetic algoithm instead of a genetic pogam. Automatic defect detection as pesented hee is an optimizatio n poblem as opposed to a data mining poblem and as such it will be teated in less detail than evese engineeing. The function of the DL in Figue 1 is to detemine if emissions fom souces one and thee ae coupting the measued emission fom souce two. If only data fom souce two is measued then the DL is expected to etun a zeo, but if senso two s measuements ae coupted by contibutions fom souces one and thee then the DL etuns unity. If the output of the DL is unity then measuement is discaded. The intention of the design enginees was that the DL would allow elatively uncoupted measuements of the output of souce two, to be ecoded. 238 Poc. of SPIE Vol. 5812

Fotunately, souces one and thee do not always emit in the same time window as souce two, allowing the output of souce two to be measued. Souce two is a sufficiently weak emitte that any enegy fom it that goes into sensos one o thee can be neglected. In addition, it is assumed that the magnitude and phase of the signal fom souce one measued by senso two is the same as the magnitude and phase of the signal measued by senso one. The same assumption is made about measuements of the signal fom souce thee by senso thee and senso two. Given the design indicated by Figue 1, if the oiginal design enginees wee not caeful, it might be possible fo sensos one and thee to eceive signals s 1 and s 3 of sufficiently low magnitude that the DL does not ecognize them as couption. At the same time they obscue the chaacteistics of the desied signal, s2 boadcasts by the souce two and eceived by senso two. Such an event would epesent a significant defect. It is this defect that the GA actually found. The defect found by the GA eflects the fitness function used. It is possible fo the GA to find many othe defects given othe fitness functions. The defect signal descibed above, that will escape detection by the DL takes the fom S D = s1 + s2 + s (15) 3, such that thee is so little vaiation in the magnitude of way of potentially satisfying the constaints is to look fo a supeposition signal, vaiation and the signals s 1 and s3 SD, that the signatue fom of s2 SD do not esult in a declaation of unity by the DL. is no longe obseved. One, such that it is flat, i.e., has no The DL evese engineeed by the GP yields unity, if the signals s 1 and s 3 ae of sufficient magnitude duing any thee consecutive time windows. In this section, it is assumed that each time window has a length five, thus thee intevals give a time window of length 15. The selection of a time window of length five was made to povide a simple example. It is assumed that the fom of the signal, s2, boadcast by souce two is known, wheeas the signals boadcast by souces one and thee ae unknown and may vay in time. The GA will attempt to find signals s 1 and s 3 such that SD shows little vaiation ove thee consecutive time windows while at the same time the DL declaes zeo. The fitness function includes ules to facilitate finding this solution. It is impotant to obseve that the DL is built into the fitness function and used fo numeical calculation. The chomosomes fo this application epesent the values of candidate solutions fo s 1 and s 3. It is not necessay to solve fo s 2, since it is assumed to be known. Also, unlike GP based data mining, thee is no database equied fo automatic defect discovey. The chomosome size is 2 x length(time window) x 3, o 30. The facto of two aises because the chomosome epesents both s 1 and s 3 ; the facto of thee, because measuements fom thee consecutive time windows ae included fo both signals. The thee time window measuements of s 1 ae encoded in the fist half of the chomosome, positions one to 15. The second half of the chomosome, positions 16 to 30, epesents s 3. Chomosomes wee selected to epesent both signals to allow genetic infomation to be exchanged between the signals though cossove. The GA uses the following paamete values: the population size is 1000, chomosome size 30, the pobability of cossove is 90 pecent, and pobability of mutations 10 pecent. The elitism size, i.e., the numbe of highest fitness individuals automatically etained fom the pevious geneation is 50. The GA uses as stopping citeia that the fitness must exceed a theshold,.99 o that 1000 geneations have passed. These paametes wee detemined based on expeience with the DL poblem and GAs. Poc. of SPIE Vol. 5812 239

7. AN EXAMPLE OF AUTOMATIC DEFECT DISCOVERY The pogam conveges afte about 300 geneations. The following assumption is made: ove thee consecutive time windows senso two measues the following data [-1, 2, -3, 20, -4, -2, 18, -1, 1, 2, -1, -2, 17, -1, 1]. Fo the assumed valued of senso two s measuements the genetic pogam found the following solution fo s 1 and s3 ove the same thee consecutive time windows senso one measuements = [-10, -22, 24, 31, 58, 41, -51, -17, -46, -30, -25, -10, 13, -29, 9 ] (16) senso thee measuements = [51, -20, 19, -11, -14, 1, -7, -22, 5, -12, -14, -28, 10, -10, 30 ] (17) S It is obseved that fo the chomosome found that the DL etuns zeo and the 15 element of SD ae all 40, i.e., D shows no vaiation ove the thee time windows. Thus the GA has found a significant design flaw in the DL. It is eadily obseved that this example is vey simple. It was included to illustate the potential of the GA based pocedue fo automatically finding flaws in designs. Detection flaws that ae moe complex and hade to discove will be the subect of a futue publication. 8. CONCLUSIONS Given a system that may not be disassembled, subected to inspection and fo which the design specification ae unavailable, detemining the design of its undelying digital logic can be difficult. A genetic pogam has been used as a data mining function to evese enginee digital logic. The database that was subected to data mining consisted of known input to the digital logic, the associated measued output and a set of ules povided by expets elating to thei assumptions about the digital logic. The genetic pogam has been poven to be capable of evese engineeing complex digital logic designs. It is found that having a set of expet ules in the database is essential; the measued output of the digital logic is aely sufficient to uniquely evese enginee the design. Fequently, pat of the motivation fo evese engineeing digital logic is to undestand potential eos the design can intoduce into measuement systems. To automate the discovey of design defects a genetic algoithm based pocedue has been developed. This pocedue uses the digital logic design discoveed with the genetic pogam as pat of the fitness function fo the genetic algoithm. In addition to digital logic making up pat of the fitness function, ules about the design s expected output duing use ae incopoated into the genetic algoithm s fitness. Fo the digital logic example the genetic pogam evese engineeed, the genetic algoithm is able to find a significant defect within 300 geneations. ACKNOWLEDGEMENTS This wok was sponsoed by the Office of Naval Reseach. The authos would also like to acknowledge D. Jeffey Heye fo useful convesations about and suppot fo an unelated application of genetic algoithms. REFERENCES 1. J.P. Bigus, Data Mining with Neual Nets, Chapte 1, McGaw-Hill, New Yok, 1996. 2. J.R., Koza, F.H. Bennett III, D. Ande, and M.A. Keane, Genetic Pogamming III: Dawinian Invention and Poblem Solving. Chapte 2, Mogan Kaufmann Publishes, San Fancisco, 1999. 3. James F. Smith, III, Fuzzy logic esouce manage: eal-time adaptation and self-oganization, Signal Pocessing, Senso Fusion, and Taget Recognition XIII, I. Kada, Vol. 5429, pp. 77-88, SPIE Poceedings, Olando, 2004. 4. James F. Smith, III, Fuzzy logic esouce manage: decision tee topology, combined admissible egions and the self- 240 Poc. of SPIE Vol. 5812

mophing popety, Signal Pocessing, Senso Fusion, and Taget Recognition XII, I. Kada, pp. 104-114, SPIE Poceedings, Olando, 2003. 5. James F. Smith, III, Fuzzy Logic Resouce Manage: Evolving Fuzzy Decision Tee Stuctue that Adapts in Real- Time, Poceedings of the Intenational Society of Infomation Fusion 2003, X. Wang, pp. 838-845, Intenational Society of Infomation Fusion Pess, Cains, Austalia, 2003, 6. D.E. Goldbeg, Genetic Algoithms in Seach, Optimization and Machine Leaning, Addison-Wesley, Reading, 1989. 7. J.F. Smith, III and R. Rhyne, II, A Resouce Manage fo Distibuted Resouces: Fuzzy Decision Tees and Genetic Optimization, Poceeding of the Intenational Confeence on Atificial Intelligence, IC-AI 99, H. Aabnia, Vol. II, pp. 669-675, CSREA Pess, Las Vegas, 1999. 8. James F. Smith, III and Andew Blank; Co-evolutionay data mining fo fuzzy ules: automatic fitness function ceation, phase space, and expeiments, Data Mining and Knowledge Discovey: Theoy, Tools, and Technology V, B. Dasaathy, pp. 59-70, SPIE Poceedings, Olando, 2003. 9. James F. Smith, III, Co-evolutionay Data Mining to Deal with Adaptive Advesaies, Poceeding of the Intenational Confeence on Machine Leaning and Applications, H. Aabnia, pp. 3-9, CSREA Pess, Las Vegas, 2003. 10. James F. Smith, III; Robet D. Rhyne II and Kistin Fishe; Data mining fo multi-agent ules, stategies and fuzzy decision tee stuctue, Data Mining and Knowledge Discovey: Theoy, Tools, and Technology IV, B. Dasaathy, pp. 386-397, SPIE Poceedings, Olando, 2002. 11. James F. Smith, III; Fuzzy Logic Resouce Management in a Multi-agent Advesaial Envionment ; Poceeding of the Intenational Confeence on Infomation and Knowledge Engineeing, IKE 2002, H. Aabnia, pp. 234-240, CSREA Pess, Las Vegas, 2002. 12. James F. Smith, III; Genetic Pogam Based Data Mining to Discove Fuzzy Rules, Poceeding of the Intenational Confeence on Atificial Intelligence, H. Aabnia, pp. 502-508, CSREA Pess, Las Vegas, 2002. 13. J.F. Smith, III and R.D. Rhyne, II, Genetic Algoithm Based Optimization of a Fuzzy Logic Resouce Manage: Data Mining and Co-evolution, Poceeding of the Intenational Confeence on Atificial Intelligence, IC-AI 2000, H. Aabnia, Vol. I, pp 421-428, CSREA Pess, Las Vegas, 2000. 14. J.F. Smith, III and R. D. Rhyne II, "Fuzzy logic esouce manage: tee stuctue and optimization", Signal Pocessing, Senso Fusion, and Taget Recognition X, I. Kada, Vol. 4380, pp.312-323, SPIE Poceedings, Olando, 2001. 15. James F. Smith, III, Fuzzy logic esouce manage: multi-agent techniques, fuzzy ules, stategies and fuzzy decision tee stuctue, Signal Pocessing, Senso Fusion, and Taget Recognition XI, I. Kada, Vol. 4729, pp. 78-89, SPIE Poceedings, Olando, 2002. 16. S, Luke and L. Panait, Fighting Bloat With Nonpaametic Pasimony Pessue, Paallel Poblem Solving fom Natue - PPSN VII. 7th Intenational Confeence Poceedings, J.J.M. Guevos, LNCS Vol.2439, pp. 411-421, Spinge- Velag, Belin, 2002. 17. J. Gibbin, Companion to the Cosmos, p. 299, The Oion Publishing Goup Ltd, London, 1996. Poc. of SPIE Vol. 5812 241

SOURCE 1 SOURCE 2 SOURCE 3 SENSOR 1 SENSOR 3 S 1 SENSOR 2 S 1 +S 2 +S 3 S 3 SUM SUM SUM DIFF H 1 MAX 2 1 1 2 H 1 DIFF H 3 H 2 H 2 H 3 AND3 AND3 ORDELAY3 ORDELAY3 OR2 Figue 1: A system of thee sensos designed to measue signals fom souce two while minimizing the coupting influences of signals fom souces one and thee. 242 Poc. of SPIE Vol. 5812