NEURAL- Neural-network modeling tools enable the engineer to study and analyze the complex interactions between material and process inputs with the goal of predicting final component properties. Fig. 1 This image shows the predicted cross-sectional tensile yield strength for an example forged titanium Ti-64 component. This image is the result of a neural-network model that has been linked to a Scientific Forming Technologies Deform finite element model. (The strength contours are in ksi.) N David Furrer* Ladish Co. Inc. Cudahy, Wisconsin Stephen Thaler Imagination Engines Inc. Maryland Heights, Missouri eural-network models are mathematical tools designed to map input to output patterns, with the overall goal of minimizing the error between modeled and measured output values. Quite a variety of neural network models have been designed to fit a range of processes and materials, but this plethora of choices can sometimes be confusing to potential users, so much so that it even inhibits their application. In particular, neural-network models have multiplied for manufacturing and metallurgical engineering. Initial application of neural-network modeling to forging processes was conducted under the U.S. Air Force sponsored Forging Supplier Initiative, and it continues under the U.S. Air Force sponsored Metals Affordability Initiative. A significant amount of math supports each type of neural-network structure. In fact, these are inherently very complex mathematical models, and it has been challenging to win acceptance by non-mathematician, practical engineers who consider math a tool and not an end in itself. Efforts at Imagination Engines Inc. have resulted in a modeling tool that has a user-friendly interface for inputting data, developing models, and analyzing results. Called PatternMaster, it enables engineers to develop and apply neuralnetwork models on a desktop computer, Fig. 1. In addition, many of the possible neural-network options can be pre-selected to provide useful, fast, and straightforward application. Also, an optimization routine can be utilized that automatically seeks and develops optimum model configurations. This article discusses various neural-network models, and then shows how PatternMaster may be applied to develop products quickly and accurately. *Fellow of ASM International 42 ADVANCED MATERIALS & PROCESSES/NOVEMBER 2005
NETWORK MODELING In 1 θ Neural-network models Neural-network models include Perceptrons, Radial Basis Functions, Probabilistic Neural Networks, Generalized Regression Neural Networks, and several others. Of these, the Perceptron models are the most common and can be tailored for nearly any application. The name of this model type does not help in its acceptance by those unfamiliar with neural networks. The term Perceptron suggests images of the brain or some neuroscientific construct, while in fact it is simply a computational program with inputs and outputs. It can be regarded graphically as a collection of nodes in a series of layers. When a perceptron has more than two layers of nodes it is called a Multilayer Perceptron, or MLP. A node can be schematically drawn as a point with inputs, outputs, and an activation function. Figure 2 shows a schematic of a neuralnetwork model node. Multilayer model The layers in a simple perceptron model consist of an input layer (which contains nodes for each input data parameter) and an output layer (which contains nodes for each resultant data parameter). This type of arrangement is suitable for linear regression analyses of datasets. In reality, many real-world relationships are nonlinear and may involve synergistic effects between several input parameters. Therefore, simple linear regression modeling does not provide accurate representations of the general relationships involved with a series of inputs and outputs. To handle this higher level of complexity, additional Input layer Hidden layer Output layer Fig. 3 This schematic shows a three-layer neuralnetwork model. The middle layers are called hidden layers. layers are added to the simple perceptron. Each node in the added layer relates to the prior layer and to the subsequent layer with connections. The added layer or layers sandwiched between the initial input and output layers are called hidden layers. This structure, shown graphically in Fig. 3, allows for very complex equations that fit the relationship between the inputs and the outputs. The larger the number of hidden layers and nodes on each layer, the more capable the MLP will be of absorbing complex relationships. Fortunately, the form of the developed relationship is not needed prior to model construction, although it is best to attempt to model datasets with minimal layers and nodes. Network nodes The nodes in a neural-network model connect all prior and subsequent nodes in a model. The connections are given values called weights. The node computes an output value based on the input weights and an activation function. The calculation of the output value (often called a signal) and the form of the activation function result in various types of neural-network models. The most common types of models form a weighted sum of inputs and weights feeding any node, and this sum is passed along to an activation function. The most common type of activation function is the sigmoid function. The sigmoid function serves to switch any given node between low and high states to help model nonlinear behaviors. The ramp connecting these low and high regions assists in modeling linear relationships. Training neural-network models The process of training is aimed at developing the relationship that best fits the general function between the input and output parameters. Any error between predicted and actual output values is measured as each record within a dataset is passed through the neural network. Then the entire set of individual errors from the model establishes an error surface. The algorithm known as the training algorithm then updates connection In 2 W 1 = x W 2 = y Node (neuron) W 3 =z Fig. 2 Schematic configuration of a node within a neural-network model, where In represents inputs, with two shown here. The W represents connection weights, Out is the output, and θ is the bias to the node. Out ADVANCED MATERIALS & PROCESSES/NOVEMBER 2005 43
can allow low rate-of-training models to escape from local minima into global minima. Output parameter Input parameter Fig. 4 This graph shows training within a neural network. Relationship A shows that the model is under-trained, and relationship C shows over-training. The optimal general relationship is shown at relationship B. weights so as to locate the minimum in the error surface. In multilayer neural-network models, the input data is passed in a feed-forward manner, shown as left to right in Fig. 3. Initially, the connection weights are set to random values. As the datasets are passed through the model, an error is calculated between the predicted outputs and the desired outputs. The corrections to all of the weights within the neural net are chosen so as to enter as rapidly as possible into the valleys of the error surface in a process known as gradient descent. By forcing the network through such gradients, we find mathematically that the update to any given weight should be the product of the net s output error, appropriately weighted by all the connection weights leading back to the neuron it feeds, the first derivative of the recipient neuron s activation function in the neighborhood of its current state, and the raw signal coursing through that weight. An additional multiplicative constant called the learning rate can speed up or slow down the traversal of such gradients. The magnitude of the learning rate is important in allowing the network weights to assume values that produce global minima, rather than local-error minima. High rates of training provide large changes from iteration to iteration based on the errors calculated, but can also lead to lack of resolution of the global minimum. The more complicated the model (i.e., the more hidden layers and nodes per layer) the greater the number of local minima. Therefore, the simplest model that works for an application will be the safest to train to avoid false minima and to determine the global minimum. Low rates of training can be a problem with complex models having large numbers of local minima, because the model may not be able to escape from a local minimum with the small jumps. A momentum term is also included in the errorcorrection term. If a correction is in the same or general direction for several correction iterations, then the subsequent corrections gain momentum, which B A C Automated training Training a neural network is an iterative process that occurs automatically within the training algorithm. Training follows these steps: Input: A set of training data is input into the model. The program processes each record and provides iterative corrections to the network s connection weights. Training errors: During this process, the training error is minimized. The training error is defined as the error between the modeled output and the outputs in the training dataset. After the training error is minimized, any implicit relationships between input and output patterns are absorbed into the neural network. Optimal training: Optimally trained neuralnetwork models describe a relationship that accurately represents the general correlation between the input and output parameters. If a model is under-trained, the general relationship may not be determined and therefore can t be represented by the collection of connection weights within the model. On the other hand, if the model is overtrained, the model will model the behavior of the examples well, but might depart from the overall general relationship. To show this graphically, Fig. 4 shows a set of plotted data points. The data may have noise in it, and is therefore not exact. A model of the general relationship of this data may best be a smooth line (B), but if the model is over-trained, a complex, higher-order relationship may be developed that fits the example training data well, but may cause problems when it is applied to other examples that are expected to apply to this model. Multilayer neural-network models with the minimum of hidden layers and nodes per layer will be resistant to over-training. The more complex the neural-network structure, the more capable the model is in forming complex, and possibly non-real general relationships. Conversely, simpler neural-network model forms can not depart too far from simple, low-order relationships. Goal of training The goal of the training process is not to minimize the training error. Instead, the goal is to minimize error when the model is used with data that was not used for training (i.e., set-aside data). This means that to correctly train a multilayer neuralnetwork model, an available dataset should be divided into two subsets: a training set and a testing set. The training set trains the model with progressive reduction in training error with successive iterations. The testing set serves to assess the so-called generalization error on a random population of representative data that was not part of the training of the model. The calculated average error between predicted and actual values in the testing dataset is 44 ADVANCED MATERIALS & PROCESSES/NOVEMBER 2005
evaluated to determine if the model is properly trained. Continued assessment of the training and generalization error will show a decrease with time, but if the model is over-trained, then the assessment error will start to increase with continued training. This is because the model is memorizing the pattern of the examples instead of gleaning the overall general pattern. In addition, noise in the training data from data measurement errors or the like will become part of the model and the assessment data set will most likely not fit exactly with the model developed from the training dataset. Once the model structure is established and it is optimally trained, the testing dataset is used to confirm the accuracy and acceptability of the model. It is important to note that a trained multilayer neural-network model is typically good only at predicting outputs from inputs that are within the range of the training dataset. Some extrapolation can be done with caution with this type of model by adjusting the scaling factor in the activation function of each neuron. Tailorable structures The previous discussions on neural-networks are only high level and are not complete in any PatternMaster software The PatternMaster software package, developed by IEI Inc., has several important features, including An XML-based script to describe details of the network architecture, training parameters, and file i/o; and A three-dimensional virtual reality display of the neural net to assist in visualization of critical factors and underlying schema. The program is extremely fast and efficient at training due to its state-of-art model engine and IEI s patented STANNO (Self Training Artificial Neural-Network Object) technology. Furthermore, a neural network built into the software automatically trains the neural network of interest, rather than requiring the engineer to do this manually. The trainer net learns by experience how to correct the weights of the trainee net. As a result, this training technique is much faster than traditional learning schemes such as conventional back-propagation. PatternMaster has five main user interface functions: Model Development Wizard, XML Program View, Network View, Input/Output Prediction Visualization, and Data View. A model development wizard creates the necessary XML training script and links in the relevant training data. After this operation, training of the neural-network model can begin. The network view (Fig. 5) shows the input, hidden, and output layers, as well as associated connections. Through simple mouse clicks and drags, the user may quickly determine which input parameters are critical to a given output parameter, based on the trained model. The software also allows the user to assess any possible combination of inputs within the range of the training dataset to determine their effect on output parameters. This can be done, one set of input data points at a time, in the Input/Output Visualization screen. This allows the user to quickly assess interactions of input variable and effects on output predictions. PatternMaster also provides program files for the trained neural network, which can be linked to other programs, or can be run as a standalone tool. The ability to export a program that emulates the trained neural network is important and extremely useful. It allows generation of the trained neural networks in Excel, Java, C++, Fortran, and other codes. These output codes can be linked to other engineering tools such as DEFORM, to allow prediction and visualization of forged component properties for any set of input processing parameters. It is a useful engineering tool to develop and apply neural-network modeling. This software is programmed to provide an optimum set of neural-network parameters (layers, nodes, training rates, momentums, etc.), which do not need to be set by an engineer. A B Fig. 5 An example of a PatternMaster network view that shows the layers, nodes, and connections is shown in A. The skeleton view in B indicates the most significant parameters that affect ultimate tensile strength, where UTS is directly related to Cooling Rate (CR) and indirectly related to Solution Temperature. ADVANCED MATERIALS & PROCESSES/NOVEMBER 2005 45
Neuralnetwork models have helped to develop a relationship between processing parameters, roll settings, and final steel-plate rolling thicknesses. form or fashion. It is clear to see that the structure of neural-network models is very tailorable, which is good from the standpoint of flexibility, but is a negative from the standpoint of usability. For a neural-network modeling tool to be practical for engineers, the tool must guide users through the setup of the most appropriate model architecture and execution of neural-network model training. No single setting will be perfect for all modeling applications and datasets, but IEI has established a modeling program called PatternMaster, in which many of the complex modeling parameters are pre-set. This program also provides an automated function for developing optimum modeling parameters. A wizard tool walks users through establishing an optimization routine that seeks an optimal model architecture and training parameters. Training models For training a neural-network model, a rule of thumb says that a minimum of one record in a dataset is needed for every neural-network weight (number of nodes and connections). This means that a model with three inputs, four outputs, and a single hidden layer of eight nodes requires a minimum of 68 input records (56 connections and 12 biases to the hidden and output layer nodes). For optimum model development, it would be helpful to have larger quantities of training data, which can be many times larger than the smallest estimated minimum. Less training data results in less fidelity of the general relationship. Models having a large number of inputs/output parameters have been successfully trained with a small amount of data to determine which input variables may contribute most significantly to the output. Once this is known, new models can be developed with a greatly reduced number of input parameters in the model. This can allow for increased model accuracy of relationships between the most significant factors when limited data is available. Successful applications The literature contains a number of citations regarding neural-network models for manufacturing and metallurgical engineering. Rolling parameters: Neural-network models have helped to develop a relationship between processing parameters, roll settings, and final steel-plate rolling thicknesses. This industrial application is aimed are reducing scrap and improving quality and yield through proper selection, monitoring, and control of in-process manufacturing parameters. Fatigue cracks: Neural-network modeling of superalloy fatigue crack growth rate has been successful. These efforts showed that second-stage fatigue crack growth rate could be predicted based on temperature, yield strength, ultimate tensile strength, and Young s modulus. The goal of these efforts is to develop a tool that could guide alloy design for slower crack-growth rate materials. Tensile properties: Another neural-network model that has been presented in the literature is the prediction of tensile properties of nickel-base superalloys based on alloy chemistry and temperature. This neural-network modeling effort has successfully predicted the tensile strength of a wide range of superalloy chemistries and test temperatures. The most significant input parameters (in order of significance) were temperature, percent titanium, percent aluminum, percent niobium, percent tungsten, percent molybdenum, and percent boron. This effort was also aimed at developing a predictive tool for alloy design and optimization. Foundation design: Design engineers who develop, manufacture, and evaluate the construction of foundations have successfully applied neural-network models. It was noted in the literature that the neural-network approach to shallow and deep foundation modeling was equal to and often superior to that of conventional models. Geotechnical materials and structures are very complicated, and many features and interactions are not well understood. Conventional modeling requires assumptions of model equation forms, which often leads to errors. The neural-network model established relationships based on available data and did not require assumptions or theories. Transformation kinetics: Researchers at Queen s University in Belfast, Ireland, have developed commercially available trained neuralnetwork models that provide transformation kinetic information (TTT) data, and mechanical property data for titanium alloys as a function of chemistry. These tools are presumably trained and tested with literature data. Metals Affordability Initiative: Current neural-network activities under the Metals Affordability Initiative include modeling of Ti-64 mechanical properties from measured input material compositions, microstructural features, and input processing parameters. Models are being created for Ti-64 at Ladish and OSU. From input processing data, it is quickly determined that several chemical elements are critical for increasing strength in Ti-64, as well as strain and the heattreat cooling rate. The developed models can provide predicted property results as a function of location within the cross-section of a forged and heat-treated component. For more information: Dr. David Furrer is Manager, Advanced Materials & Process Technology, Ladish Co. Inc., Cudahy, WI 53110-8902; tel: 414/747-3063; e-mail: dfurrer@ladishco.com; Web site: www.ladishco.com. Dr. Stephen Thaler is Chairman and CEO of Imagination Engines Inc., 11970 Borman Drive, Suite 250, St. Louis, MO 63146-4153; tel: 314/317-2228 x 4428; e-mail: sthaler@imagination-engines.com; Web site: www. imagination-engines.com. 46 ADVANCED MATERIALS & PROCESSES/NOVEMBER 2005