Cluster analysis of 3D seismic data for oil and gas exploration

Similar documents
Parallel Fuzzy c-means Cluster Analysis

Figure (5) Kohonen Self-Organized Map

Chapter 7: Competitive learning, clustering, and self-organizing maps

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS

Seismic facies analysis using generative topographic mapping

Two-step Modified SOM for Parallel Calculation

Fuzzy-Kernel Learning Vector Quantization

Artificial Neural Networks Unsupervised learning: SOM

IMPLEMENTATION OF FUZZY C MEANS AND SNAKE MODEL FOR BRAIN TUMOR DETECTION

A SURVEY ON CLUSTERING ALGORITHMS Ms. Kirti M. Patil 1 and Dr. Jagdish W. Bakal 2

Data Mining. Kohonen Networks. Data Mining Course: Sharif University of Technology 1

Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms.

Self-Organizing Maps for cyclic and unbounded graphs

ECM A Novel On-line, Evolving Clustering Method and Its Applications

Final Exam. Controller, F. Expert Sys.., Solving F. Ineq.} {Hopefield, SVM, Comptetive Learning,

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

6. Dicretization methods 6.1 The purpose of discretization

USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION

Machine Learning : Clustering, Self-Organizing Maps

A Short Narrative on the Scope of Work Involved in Data Conditioning and Seismic Reservoir Characterization

Function approximation using RBF network. 10 basis functions and 25 data points.

AUTOMATED STRUCTURE MAPPING OF ROCK FACES. G.V. Poropat & M.K. Elmouttie CSIRO Exploration & Mining, Brisbane, Australia

Rough Set Approach to Unsupervised Neural Network based Pattern Classifier

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Fuzzy C-MeansC. By Balaji K Juby N Zacharias

EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR

Unsupervised learning

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Prototype Selection for Handwritten Connected Digits Classification

arxiv: v1 [physics.data-an] 27 Sep 2007

CHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE

11/14/2010 Intelligent Systems and Soft Computing 1

An adjustable p-exponential clustering algorithm

Automatic Group-Outlier Detection

PATTERN RECOGNITION USING NEURAL NETWORKS

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining, 2 nd Edition

Machine Learning Reliability Techniques for Composite Materials in Structural Applications.

Predicting Porosity through Fuzzy Logic from Well Log Data

Seismic regionalization based on an artificial neural network

Automatic Seismic Facies Classification with Kohonen Self Organizing Maps - a Tutorial

Using Decision Boundary to Analyze Classifiers

What is a receptive field? Why a sensory neuron has such particular RF How a RF was developed?

4. Cluster Analysis. Francesc J. Ferri. Dept. d Informàtica. Universitat de València. Febrer F.J. Ferri (Univ. València) AIRF 2/ / 1

Unsupervised Learning

Equi-sized, Homogeneous Partitioning

A Survey on Image Segmentation Using Clustering Techniques

Preprocessing DWML, /33

Semi-Supervised Clustering with Partial Background Information

Fluid flow modelling with seismic cluster analysis

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

An Improvement of Centroid-Based Classification Algorithm for Text Classification

MURDOCH RESEARCH REPOSITORY

Analyzing Outlier Detection Techniques with Hybrid Method

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

The k-means Algorithm and Genetic Algorithm

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

INTERNATIONAL RESEARCH JOURNAL OF MULTIDISCIPLINARY STUDIES

University of Florida CISE department Gator Engineering. Clustering Part 5

A new hybrid fusion method for diagnostic systems

COMPUTATIONAL NEURAL NETWORKS FOR GEOPHYSICAL DATA PROCESSING

Unsupervised Learning

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms

Efficient Object Extraction Using Fuzzy Cardinality Based Thresholding and Hopfield Network

Methods for Intelligent Systems

CS6220: DATA MINING TECHNIQUES

A Topography-Preserving Latent Variable Model with Learning Metrics

Images Reconstruction using an iterative SOM based algorithm.

HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging

Motion Detection. Final project by. Neta Sokolovsky

9. Lecture Neural Networks

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

Collaborative Rough Clustering

Application of genetic algorithms and Kohonen networks to cluster analysis

Data Mining. Moustafa ElBadry. A thesis submitted in fulfillment of the requirements for the degree of Bachelor of Arts in Mathematics

Multiple Classifier Fusion using k-nearest Localized Templates

CAD SYSTEM FOR AUTOMATIC DETECTION OF BRAIN TUMOR THROUGH MRI BRAIN TUMOR DETECTION USING HPACO CHAPTER V BRAIN TUMOR DETECTION USING HPACO

Customer Clustering using RFM analysis

Keywords - Fuzzy rule-based systems, clustering, system design

Machine-learning Based Automated Fault Detection in Seismic Traces

Clustering with Reinforcement Learning

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

AN IMPROVED MULTI-SOM ALGORITHM

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database.

Modification of the Growing Neural Gas Algorithm for Cluster Analysis

International Journal of Mechatronics, Electrical and Computer Technology

Extract an Essential Skeleton of a Character as a Graph from a Character Image

A NEW ALGORITHM FOR OPTIMIZING THE SELF- ORGANIZING MAP

A Fuzzy Rule Based Clustering

International Journal Of Engineering And Computer Science ISSN: Volume 5 Issue 11 Nov. 2016, Page No.

Web page recommendation using a stochastic process model

Chapter 4 Fuzzy Logic

Nonlinear dimensionality reduction of large datasets for data exploration

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Selection of n in K-Means Algorithm

Slide07 Haykin Chapter 9: Self-Organizing Maps

Supervised vs.unsupervised Learning

Well Analysis: Program psvm_welllogs

Cluster Analysis using Spherical SOM

Transcription:

Data Mining VII: Data, Text and Web Mining and their Business Applications 63 Cluster analysis of 3D seismic data for oil and gas exploration D. R. S. Moraes, R. P. Espíndola, A. G. Evsukoff & N. F. F. Ebecken COPPE/Federal University of Rio de Janeiro, Brazil Abstract Seismic attributes are information extracted from seismic data and constitute important tools to estimate the geological structure of a place, helping the understanding of the subsurface and reducing the uncertainties on interpretations. This comprehension is crucial to tasks such as lithology prediction and reservoir characterization. Seismic attributes are generated by transforming data from a seismic line (two dimensional data) or a seismic volume (three dimensional data). This work presents a study of clustering algorithms to these attributes and the techniques employed follow two distinct approaches: a self organizing map to perform crisp clustering and fuzzy c-means to perform partial clustering. The evaluations of the partitions are performed with the PBM index which indicates the best number of groups. Data from a Brazilian oil field is used to test the algorithms. Keywords: clustering, neural networks, fuzzy sets, cluster validity. Introduction In oil industry, the decision making of exploration and exploitation processes involves dealing with geological, geophysical and geochemical data. The evolution of computer hardware technology and the development of data base management systems (DBMS) allow the data to processed and stored in very huge databases. This fact represents a great opportunity for a company to record all data from its operations. However, it makes the analysis and understanding of such data difficult. Thus, Data Mining techniques may help the experts on their task by automatically discovering useful information from these huge volumes of data. Examples of activities that can be improved by the use of these techniques doi:0.2495/data06007

64 Data Mining VII: Data, Text and Web Mining and their Business Applications are reservoir characterization, prediction and classification of rock porosity, optimal placement of wells and detection of oil spills []. In this work, two well known clustering techniques are employed on 3D seismic data. It was expected that the clusters may help geologists to identify different lithologies on the region studied. The quality of the groups formed is assessed using a cluster validation index called PBM [2]. The clustering algorithms are fuzzy c-means [3] and a neural network approach known as selforganizing map [4]. The dataset was obtained from Namorado oil field in Brazil. This paper is organized as follows: the next section presents the clustering methods used in the application. Section 3 describes the PBM index, used to validate the results generated by the clustering methods and section 4 presents details about the data base and how it was treated. Section 5 shows the results obtained and, in last section, some concluding remarks and suggestions of future research are done. 2 Clustering methods Clustering is one of the most usual tasks in the process of data mining, helping the discovery and identification of distributions and patterns of interest. It aims to organize a data set into groups of similar elements by detecting implicit relationships between attributes recorded on data [5]. The algorithms studied on this work perform partitioning clustering. This kind of clustering splits a data set into a pre-defined amount of groups or partitions. From an initial partition, such techniques search for the best possible splitting until some stopping criteria is reached. A good partitioning must yields groups very different among themselves as well as formed by similar objects. Following, the algorithms here studied are presented. 2. Fuzzy c-means On classic partitioning, a data set X = { x, x2,..., xn} with n objects x i R must be allocated into K groups such that each object belongs to only one group. In other words, given G, G2,..., GK groups, it is verified that G i and G i G j = for every i, j =,..., K, i j. The partition matrix U(X) presents the result, such that each element has u ij = if x i G j and u ij = 0 otherwise. In fuzzy clustering algorithms, an object can be assigned to different groups with partial membership grades u ij [ 0,]. Fuzzy c-means splits a data set into K groups and assigns a membership grades for all objects by optimizing the following objective function: p F n K m u ij i= j= = xi w j 2 ()

Data Mining VII: Data, Text and Web Mining and their Business Applications 65 in which the fuzzifier m is a real value greater than, u ij is the membership grade of object x i to group G j, and w j is the center of G j. As the optimization process continues, the membership grades and the centers of groups are updated by the following expressions: u ij = K k = x x i i w w j k n m uij xi i= j = n m uij i= 2 m (2) w (3) When no more changes occur on partition matrix, the algorithm is stopped. 2.2 SOM neural networks Artificial neural networks are computing models characterized by systems that remind (in some level) the human brain structure. Self-organizing maps (SOM) is a network model which main characteristic is the neighborhood concept; that is, the network learns to recognize neighbor sections in the training data as well as the topology of the training set [4]. Neurons in the SOM s competition layer are originally arranged in fixed positions according to a topology function. It is possible to arrange the neurons topology into rectangular or hexagonal shapes. This network employs competitive learning and the closest neuron to the input data is the output unit. All the neurons from a certain neighborhood of this winner neuron are updated by Kohonen s rule and the value of radius determines the neighborhood space. The neural network determines the winner neuron by X Gw = min{ X G j }, or w = arg min{ X G j }. (4) j j in which X is the input data vector, G j is the synapses vector and w is the index of the winner neuron. The synapses of the winner neuron are adjusted by means of a linear combination between the preceding and the current weight: G j ( t + ) = G j ( t) + hwj ( t)[ X( t) G j ( t)] (5) in which t is the discrete-time coordinate and h wj (t) is the neighborhood kernel. Usually this kernel is calculated as h ( t) = h( r r, t) (6) wj where r w and r j are the radius of the neuron w and j, respectively. The last expression reflects the general definition of a neighborhood kernel. w j

66 Data Mining VII: Data, Text and Web Mining and their Business Applications In this work, it was used the concept of bubble neighborhood, defined by α(t), if neuron j belongs to neuron w neighborhood h wj = (7) 0, otherwise in which α(t) is a monotonically decreasing function of time ( 0 < α ( t) < ). In SOM networks, the number of groups is determined by the network size. In this study, to permit the comparison of the results with the fuzzy c-means clustering, the first SOM layer was formed by only one neuron and the second SOM layer has the number of groups it is applied to. 3 Cluster validation index One of the greatest difficulties of applying a partitioning algorithm is to set the amount of groups and frequently no one knows the correct value with antecedence. Thus, the clustering algorithm is executed with diverse values of groups and the best one can be identified by cluster validation indexes. As mentioned before, a good clustering result exhibits compact and separated groups and these indexes evaluate the groupings based on these features. The measure used here is PBM index: 2 E PBM ( K) = DK, (8) k E in which K is the number of clusters, E K is the sum of intra-cluster distances, E is the sum of distances of all points to the center of data, and D K is the maximum of within-cluster distances. They are given by the following expressions: k n EK = uij d ( x i, w j ), (9) j= i= k DK = max d( w i, w j ). (0) i, j= On these formulas, K u ij is degree of membership of object x i to group with center w j and d is the distance function. The greater is the index value the best is the partition. 4 Seismic data and experiments performed In order to recognize and classify reservoir characteristics, geoscientists analyze the variability in seismic reflections [6]. This activity demands a professional able to identify subtle waveform characteristics and tools to measuring these characteristics are given by seismic attributes. So these attributes are important in reservoir characterization [7] and the correct use of them depends on the experience of a geoscientist, that is, his understanding and interpretation of their behavior. G j

Data Mining VII: Data, Text and Web Mining and their Business Applications 67 The 3D seismic data used in this work are from Namorado oil field at Campos basin in Brazil. The seismic attributes used on this research were created using OpendTect software (www.opendtect.org), which is able to generate and visualize them. The attributes used are amplitude, cosine phase and Hilbert transformation. A dataset with over 223000 records was created using these ones. Both clustering algorithms were run from 2 to 0 partitions in order to identify the best number of groups. As they are randomly initiated (centers coordinates and memberships grades), each one was performed 0 times to obtain the mean results. Table shows the algorithms settings. Therefore the data set was approached by one setting of fuzzy c-means and two of SOM network, with rectangular or hexagonal topology. Table : The settings of algorithms. Parameter Value Fuzzy c-means SOM fuzzifier.5 error tolerance 0. neurons on first layer neurons on second layer from 2 to 0 epochs 0 6 learning rate 0. radius 0 neighborhood function Bubble topology rectangular or hexagonal Table 2: Centers coordinates of groups. Algorithm Coordinates Amplitude Cosine phase Hilbert transform. G G2 G3 G G2 G3 G G2 G3 Fuzzy c-means.476.476.476.895.497.03.7.65.67 SOM rectangular.476.472.476.890.502.09.77.73.7 SOM hexagonal.476.472.476.890.502.09.77.73.7

68 Data Mining VII: Data, Text and Web Mining and their Business Applications 5 Results Figures -3 shows the PBM index variations with the number of groups for the three experiments. As it can be noticed, both clustering algorithms found three groups as the best value for partition the data, which suggests that it is the correct one. Moreover, almost all the values of PBM index were very similar, indicating that the groups formed on this data by fuzzy c-means and SOM networks have comparable quality. Table 2 shows the mean values (after normalization) of the centers coordinates of the three groups for each experiment and it can be seen that they are almost the same. Figure 4 shows a clustering result visualized in OpendTect software and it can be seen three groups (black, white and gray). PBM Index 0.38 0.35 0.32 0.29 0.26 0.23 0.2 0.7 0.4 0. 0.08 2 3 4 5 6 7 8 9 0 Number of groups Exp Exp2 Exp3 Exp4 Exp5 Exp6 Exp7 Exp8 Exp9 Exp0 Figure : Results from fuzzy c-means. PBM Index 0.36 0.32 0.28 0.24 0.2 0.6 0.2 0.08 2 3 4 5 6 7 8 9 0 Number of groups Exp Exp2 Exp3 Exp4 Exp5 Exp6 Exp7 Exp8 Exp9 Exp0 Figure 2: Results from SOM with rectangular topology.

Data Mining VII: Data, Text and Web Mining and their Business Applications 69 PBM Index 0.3600 0.3200 0.2800 0.2400 0.2000 0.600 0.200 0.0800 2 3 4 5 6 7 8 9 0 Number of groups Exp Exp2 Exp3 Exp4 Exp5 Exp6 Exp7 Exp8 Exp9 Exp0 Figure 3: Results from SOM with hexagonal topology. Figure 4: Visualization of a clustering result. 6 Conclusions This study aimed to show that clustering seismic data may help geologists understand the lithologies of the subsurface. It was generated a dataset with

70 Data Mining VII: Data, Text and Web Mining and their Business Applications thousands records and three relevant seismic attributes and was possible to see that both clustering algorithms, self-organizing maps and fuzzy c-means, achieved very similar partitioning the same number of groups with centers almost coincident. In order to verify the quality of a partitioning, it was employed the PBM cluster validation index. Future works should consider the generation and use of other seismic attributes and the presentation of the results to a specialist to qualify them. Other clustering techniques specific to signal processing will be used too. Acknowledgments This work was partially supported by HP Brazil R&D and the Brazilian research institutions FINEP and CNPq. The authors are also grateful to the Brazilian Petroleum Agency (ANP) who provides the data for this study. References [] Aminzadeh, F., Applications of AI and soft computing for challenging problems in the oil industry. Journal of Petroleum Science and Engineering, 47, pp. 5-4, 2005. [2] Pakhiraa, M.K., Bandyopadhyayb, S., Maulikc, U., Validity index for crisp and fuzzy clusters, Pattern Recognition, 37, pp. 487-50, 2004. [3] Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press: New York, 98. [4] Kohonen, T., Self-Organizing Maps, Springer: New York, 200. [5] Jain, A.K., Murty, M.N., Flynn, P.J., Data Clustering: A Review, ACM Computing Surveys, 3(3), pp. 264-323, 999. [6] Bodine, J.H., Waveform analysis with seismic attributes. Oil & Gas Journal, 84(24), pp. 59-63, 984. [7] Taner, M.T., Koehler, F., Sheriff, R.E., Complex seismic trace analysis, Geophysics, 44, pp. 04-063, 979.