Data Mining Approaches to Characterize Batch Process Operations

Data Mining Approaches to Characterize Batch Process Operations Rodolfo V. Tona V., Antonio Espuña and Luis Puigjaner * Universitat Politècnica de Catalunya, Chemical Engineering Department. Diagonal 647, 08028 Barcelona, Spain. Abstract n this work, an approach to mine data from batch process operations is presented. The aim is to extract knowledge from data and to support the design or redesign of monitoring systems. Multivariate models at recipe and lower levels are obtained by a multiscale PCA (MsPCA) approach. Then, fuzzy clustering is used to help to identify operational conditions by each product recipe. Cluster membership information is used to define effective rules that aid to characterise the operation of the plant for future productions. How to handle time-varying trajectories and how to catch their associated dynamics with the multiscale PCA is specially considered. An example based on a real pilot plant is used for illustrative purposes. Keywords: Data Mining, Batch Process, MsPCA, Fuzzy Clustering. 1. ntroduction Nowadays, large amounts of process operational data are recorded in Chemical Plants. t has been recognised that these data have a great potential to provide insight into the process (Stockill, 2002). So, developments of advanced data analysis tools and methods are required. Particularly, adoptions and applications of Data Mining approaches that aid to extract useful knowledge from data are claimed (Stockill, 2002, Wang, 2001). Some recent proposals have been made to support monitoring of continuous and batch processes. For batch, existing multivariate methods like Multiway PCA (MPCA) and PLS (Nomikos et al, 1994) has been proposed to obtain reduced characterisations of productions. However, in these methods, issues like time varying operations, outliers, production by recipes and the transitory dynamics of trend variables are not well treated or solved. Also, issues like the multiscale nature of the data have been separately considered. n the case of MPCA it is assumed that operating times of all batches are equal. However, this is not true in many real applications. Extensions to solve the timevarying problem have been proposed by use of aligning of variables (resampling) by reference to a variable indicator with dynamic time warping techniques (Kassidas et al, 1998). The disadvantage of this is that multiscale feature of variables are not taken into account. Additionally, some variables cannot be available at the corresponding resampling intervals. Chen et al (2000) uses orthonormal basis functions to represent the variables profiles and then build an MPCA model over the coefficients of the * To whom correspondence should be addressed mail: luis.puigjaner@upc.es

orthonormal functions. This last approach is more appropriate to try the time varying problem, but important issues like the selection of the functions and the multiscale nature of the data are not considered. Clustering has also been proposed for batch operation (Yuan et al., 2001). t is used together with PCA to support the identification of operating conditions and to the design of the monitoring system. The issue of operation by recipes into the analysis is also considered. Nevertheless, time varying, outliers and multiscale are not considered here. As a consequence, the identification and the obtained monitoring model will be suboptimal. n this work an alternative approach is proposed to explore batch operational data and to assist in the design or redesign of monitoring system. The integration of orthonormal bases functions with PCA is adopted by using Wavelet. The resulting MsPCA is combined with Fuzzy Clustering. The identification with clustering allows the generation of operation rules that improves the knowledge of the process and serves as a monitoring system together with the MsPCA. 2. Proposed Approach 2.1 Multiscale Modelling of Data Multivariate Statistical techniques have been extensively used for monitoring Batch Processes. MPCA is one of the most known methods. To apply this method it is assumed that experimental data form a three-dimensional array. The resulting matrix, Xo, is of xjxk, where J variables are measured at K times in each one of the batches. Then, Xo can be unfolded into a large two-dimensional matrix, X, of xjk (figure 1). Then, PCA can be over this unfolded matrix. n the method, it is supposed equal operation time (K) for all the batches which limits the application to the cases where batches are different in time. Observations K Batches Vari ables J J xk Variables x Observations Figure 1. Unfolding of the Three-way Batch data set. To overcome this problem we adopt an approach based on function approximation (Chen et al, 2001). n this approach, the matrix X is obtained like in MPCA, but the resulting matrix is ordered as follows: [ X ] xjk 1 6444 74448 k = 1) k = 2)... k = K ) = M 6444 74448 k = 1) k = 2)... k = K ) L L L 1 64447444 8 x J ( k = 1) x J ( k = 2)... x J ( k = K ) M 6444 74448 k = 1) k = 2)... k = K ) (1)

n the above matrix, each element is the profile of variable x j in batch run i. These profiles can be represented by ƒ i,j (t) functions. Chen (2001) proposed the use of approximation functions to obtain ƒ i,j (t). Approximation functions constitute sets of orthonormal bases with very good properties for represent signals. They allow representing ƒ(x) as: N 1 n= 0 f ( x) c n φ ( t) (2) n where, C N = {c n } n=0,1,,n-1., and {φ n (t)}, represent a set of square integral functions. Then, based on Lagrange polynomial functions, equation (1) becomes: [ X ] M L M = [] c [] c L [] c xjk f1,1 ( t) L f1, J ( t) = [ xn xn xnj ] (3) 11 2 f,1( t) L f, J ( t) where N j is the required number of bases in ƒ i,j (t) to approximate the measurements j and [c] xnj is the trajectory coefficient matrix of measurement j which is spanned by N j. N j is always the same on normal operation. So, by applying PCA on X is obtained a good representation of batches with different time duration. The above solution strategy is proposed over one scale. However, it has been widely recognised the multiscale nature of chemical data (Bakshi, 1998). Also, the selection of Lagrange polynomials is not an obvious alternative. n this work, all this is solved by using Wavelets. Wavelets are families of functions with very good properties as orthonormal functions. By combining approximations, at different scales, they are able to catch fine details and trends with very good accuracy. So, the resulting approximation is expressed as: f ( x) = d mk ψ ( t) (4) m= 1 k= mk where d m represents the c n coefficients in equation 2 at the m th scale and ψ m define the k th basis functions at the m th scale. n relation to the selection of the function, Daubechies wavelets are used because of their very good capabilities to represent polynomial behaviour. The extraction of functions with wavelets is achieved through de-noising (Nounou et al,1999). t allows to eliminate the effect of noise with clear advantages over subsequent analysis. So, the matrix X is built by way of the d m denoised coefficients and PCA is applied on it. There is a clear difference with the multiscale PCA proposed by Bakshi (1998). Here, PCA is only built over the complete wavelets coefficients matrix and the de-noising is ensured before PCA. 2.2 Fuzzy Clustering Clustering are techniques that attempts to assess the relationships among data patterns belonging to different groups. n this work, fuzzy clustering is adopted with the purpose to identify operating region patterns and as a base for rules generation. Fuzzy-c-mean is used as the clustering technique. t is an algorithm that can automatically identify the centre of each cluster and calculate the membership values of each data case to each cluster. t is based on the minimization of the sum of squared Euclidean distances between data (X k, k=1 n) and cluster centres (v i, i =1 c):

Min J m c n m ( U, V ) = ( µ ) x v (5) 1 k= 1 ik k i 2 where 1 m is the fuzziness index, c is the number of clusters, and µ ik denotes the matrix of a fuzzy c-partition. The last fuzzy c-partition is constrained as follows: µ ik [0,1] i, k, µ ik = 1 k, C i = 1 n k = 1 µ ik < n, i. (6) n other words, each X k could belong to more than one cluster with each membership (µ) taking a fractional value between 0 and 1. The details of the algorithm are not shown for space reasons (Bezdeck, 1981). However, it is noted that the algorithm is dependently of C. Also, because the objective function is based on Euclidian distances, the method tends to identify clusters with only spherical forms. n this work, the Mahalanobis distance is used (equation 5) to extend the identification up to spherical and ellipsoids forms. Additionally, a simple algorithm, the mountain method (Yager et al, 1994), is used as a pre-estimator of C, for cases when it is not known a priori. Rules generation n the above methods, each cluster centre is in essence a prototypical data point that exemplifies characteristics behaviour of a system. Then, the membership information, µ, of each data point allows associating it to a pattern of the system. So, simple rules can be generated by extract the larger membership of each point and relate it to a pattern as follow: f µ J1 is A then {C 1 = pattern 1 } 1 is produced. f µ J2 is B and µ J2 is D then {C = pattern} 1 is produced. Here, A, B and D, can represent values of variables like Temperature, etc. and patterns can express an associated operating condition for a recipe. As a consequence, a more insight into the process can be obtained and used to design a monitoring system. 2.3 Global procedure. Data mining approaches. The above methods are combined to extract knowledge from batch process data. The way by which data is analysed, is determined by two important issues, the production by recipes and the operation development by stages. Two levels of analysis are proposed: One at the recipe level (entire batch) and another at the stage level (sub-step). At the first level, the overall data from different batches are used. A matrix, X c, with all process variables, is built together with a separate matrix of important quality variables, X Qi. Wavelet coefficient matrices for each matrix (X c, X Qi ) are obtained. The resulting matrices are processed with PCA. Reduced representations (patterns) of batches and the profile of each X Qi are obtained. Next, clustering is applied. Groups of patterns and its memberships µ to groups are obtained. Groups of one or two objects, with their respective coefficients into X c and X Qi are rejected as outliers and PCA models are obtained again. This rejection step allows to eliminate some possible abnormal batches and to select data of good batches. When no rejections are registered, groups of operations are defined. t can occur that some recipes will be grouped in a same cluster

which suggests similar operating conditions and, possibly, a single model for these recipes. n a second step, the data set obtained in the above step is used and the recipes identified as similar are analysed together. A pre-processing step with wavelet is applied over the variables profiles to identify stages. Then, MsPCA is applied over individuals X pi matrices and by groups of stages in profiles. The obtained information, µ, is mapped onto the µ information at the level recipe. So, simple rules about conditions in each stage that can conduct to a product grade of a recipe are derived. Finally, data rejected at the recipe level is analysed to identify the potentially abnormal operation. So, the knowledge about the process is expanded. t should be noted that a similar analysis (by levels) has already been proposed by Yuan et al, (2001). They first apply the stage level analysis to obtain a reduced representation of variables with the most significant principal components (PC s). Then, PCA is applied over PC s. However, PC s are not well suited to catch the dynamic trend information of variables. Our proposed use of wavelets is much more appropriate for this task. Checking about abnormal data is no made and the presence of outliers is not considered by Yuan et al., which can mask the identification of groups. Additionally, their approach can not be applied over time varying processes. 3. Batch Pilot Plant Application A Real pilot plant at UPC has been selected as the scenario for testing purposes. t contains three reactors, heat exchangers and a highly flexible connectivity between them that is achieved via a network of pipes, pumps and valves. t has been used to generate data for several products recipes. A total of 24 experiments are generated with different length in operation time. First, the analysis over the recipe level is made. Figure 2a. µ values with a bad batch Figure 2b. µ values without bad batches Membership values (µ) of batches to different groups are obtained (figure 2a). The groups are verified as representing specific recipes. Also, it is noted the effect of a bad batch with a low µ in recipe 3. When it is rejected, the definition of the groups is improved (figure 2b). Subsequent analysis help to identify different operating conditions associated to the existing recipes. Then, operating rules about each one of the recipes are obtained. t can be noted that in the application of this analysis the process variables were recorded with a frequency of one minute while the quality

variables were recorded at a frequency of 5-10 minutes. Because PCA is applied over the coefficients of wavelet approximations of each signal, the low difference in sampling is not limiting. However, it should be noted that the method can not be applied in cases with larger differences between sampling frequencies. 4. Conclusions A new methodology to explore batch processes has been proposed. The methodology is capable of deal with important issues like time varying trajectories and outliers. Also, it is appropriate to represent operations, by stages and by recipes, with rules. This last capability is useful to improve the understanding of the process. Also, it is shown as very useful to the design or redesign of monitoring systems. Data about the pilot plant have served to illustrate the methodology. Nevertheless, additional applications over this and/or other real scenarios should be made to establish the generalization of the method. Also, the problem of different sample frequencies must be additionally studied. Finally, the approach is observed as potentially useful to obtain a root cause analysis databases to support tasks like equipment maintenance or scheduling. All this, will be explored in future works. Acknowledgment Financial support from the Generalitat de Catalunya (F research grant for Tona, R.V.) and from the European Community (projects VPNET-GRD-CT-2000-00318 and CHEM-GRD-CT-2001-00466) are gratefully acknowledged. References Bakshi, B. R., 1998, AChE Journal, 44, 7, 1596-1610. Bezdek, J., 1981. Pattern recognition with fuzzy objective function algorithms, Plenum, N.Y. Chen, J., and Liu, J., 2001, Chem. Eng. Sci., 56, 10, 3289-3304. Kassidas, A., MacGregor, J.F., and Taylor, P., 1998, AChE J, 44, 4, pp. 864-875. Nomikos, P. and MacGregor, J.F., 1995, Technometrics, 37, 1, 41-59. Nounou, M. N. and Bakshi, B. R., 1999, AChE Journal, 45, 5, 1041-1058. Stockill, D., 2002, ESCAPE-12 (Ed. Grievink, J., Schjindel, J.,), Elsevier, 70-77, Amsterdam, Netherlands. Wang, X.Z., 2001. Application of Neural Networks and other Learning Technologies in Process Engineering. London : mperial College Press. Yager, R.R, and Filev, D.P., 1994, EEE Trans. on Syst., Man, & Cyb., 24, 8, pp.1279-1284. Yuan, and Wang, X.Z., 2001, Chem. Eng. Comm., 185, 201-221