Logging Reservoir Evaluation Based on Spark. Meng-xin SONG*, Hong-ping MIAO and Yao SUN

Size: px

Start display at page:

Download "Logging Reservoir Evaluation Based on Spark. Meng-xin SONG*, Hong-ping MIAO and Yao SUN"

Eugenia Gibbs
5 years ago
Views:

1 2017 2nd International Conference on Wireless Communication and Network Engineering (WCNE 2017) ISBN: Logging Reservoir Evaluation Based on Spark Meng-xin SONG*, Hong-ping MIAO and Yao SUN Computer Application Technology Research Department Research Institute of Petroleum Exploration & Development Beijing, China *Corresponding author Keywords: Big data, IBM BigInsights, Spark, Decision tree, Reservoir evaluation. Abstract. In the past, most traditional logging reservoir evaluation methods rely on expertise. However, as data size grow rapidly, the efficiency of manual analysis is low. With the big data technology becoming more and more mature, we can use big data platform to evaluate logging reservoir. In this paper, we proposed a Spark based logging reservoir evaluation method. By constructing a 3 nodes IBM BigInsights big data platform, using the decision tree algorithm of Spark, we accomplished the evaluation of logging reservoir. We tested our proposed algorithm on a dataset in an oil-field in Northwest China, the proposed algorithm can accomplish the evaluation of logging reservoir. This approach doesn't rely much on expertise and can achieve a better efficiency than traditional evaluation. The proposed method can provide a reference to using big data platform to evaluate the logging reservoir. Introduction The basic idea of reservoir evaluation is taking full advantage of various materials including well logging data to evaluate reservoir properties, such as oil bearing evaluation, layer classification, reservoir parameters prediction, capacity estimation etc. The traditional methods for logging reservoir evaluation rely on expertise [1,2]. However, with the steady accumulation of data, the datasets are too large, manual analysis is time-consuming. Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them [3]. As for the oil exploration and development area, big data platform can help us to processing, analyzing large volume data. Spark is an important module of big data platform, is a fast and general engine for large-scale data processing. In this paper, we proposed a Spark based logging reservoir evaluation method. By constructing a 3 nodes IBM BigInsights big data platform, using the decision tree algorithm of Spark, we accomplished the evaluation of logging reservoir, and tested the proposed on a dataset in an oil-field in Northwest China. Big Data Related Technologies In 2012, Gartner defines big data as follows: "Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation." [3] Lately, the term "big data" tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. Spark Spark is an open-source cluster-computing framework, is a fast and general engine for large-scale data processing, compared with Hadoop MapReduce, Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk [4]. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming, as shown in figure 388

1. Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS (Hadoop Distributed File System), Cassandra, HBase, and S3 [5].

BigInsights is a collection of value-added services that can be installed on top of the IBM Open Platform with Apache Spark and Apache Hadoop, it provides a complete solution, including Spark, to

2 1. Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS (Hadoop Distributed File System), Cassandra, HBase, and S3 [5]. IBM BigInsights Big Data Platform Figure 1. Main modules of Spark. BigInsights is a collection of value-added services that can be installed on top of the IBM Open Platform with Apache Spark and Apache Hadoop, it provides a complete solution, including Spark, to scale analytics quickly and easily [6]. The architecture of BigInsights is shown in figure 2. BigInsights includes nearly 20 Apache projects critical to the Hadoop and Spark ecosystems, such as Hive, Sqoop, Hbase, etc, it uses Apache Ambari as the installer, which enables you to install only those components that you need. Figure 3 shows the homepage of Ambari. Figure 2. The architecture of BigInsights. 389

Figure 3. The homepage of Ambari. Spark Based Logging Reservoir Evaluation Decision Tree Algorithm Decision tree builds classification or regression models in the form of a tree structure.

3 Figure 3. The homepage of Ambari. Spark Based Logging Reservoir Evaluation Decision Tree Algorithm Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed [7]. The final result is a tree with decision nodes and leaf nodes. The core algorithm for building decision trees called ID3 by J. R. Quinlan which employs a top-down, greedy search through the space of possible branches with no backtracking. ID3 uses Entropy and Information Gain to construct a decision tree. Experiment Environment Configuration Using VMware virtualization platform, we build a 3 nodes IBM BigInsights big data platform, the configuration of the virtual machines is 16GB memory, 8vCPUs, 100GB disk, the operating system of the virtual machines is RedHat 6.6. We used Apache Ambari [6] as the installer, which enables you to install only those components that you want or need, the version of Ambari is virtual machine is used for Ambari server, and the other 2 virtual machines are used for Ambari agent, the architecture of the experiment environment is shown in figure 4, Scala version is , Intellij IDEA version is Figure 4. Experiment Environment Architecture. 390

Experiment and Analysis The main task of reservoir evaluation is using original data collected by well logging to evaluate its properties, such as porosity, oil saturability, lithological characters

We tested our proposed algorithm on a dataset in an oil-field in Northwest China, original data collected by well logging mainly includes well logging curves, such as potential curve (SP), gamma ray

4 Experiment and Analysis The main task of reservoir evaluation is using original data collected by well logging to evaluate its properties, such as porosity, oil saturability, lithological characters etc. and finally give a comprehensive evaluation [8]. We tested our proposed algorithm on a dataset in an oil-field in Northwest China, original data collected by well logging mainly includes well logging curves, such as potential curve (SP), gamma ray curve (GR), resistivity curve (RT, RI, RXO), density log (DEN), acoustic logging (AC), compensated neutron log (CNL) etc., there are 7 classes in our experiment data, such as oil layer, water layer, dry layer, etc. The experimental data is shown in table 1. Table 1. Experimental data demonstration. NO. sp ac rt rt1 rxo por perm sw sh conclusion water layer water layer gas layer oil layer water layer MLlib is Apache Spark's scalable machine learning library. In our experiment, we use the decision tree algorithm of MLlib. The workflow of our proposed method is shown below, firstly, upload our data to HDFS; secondly, build a decision tree; thirdly, optimize the parameters of the decision tree. The Scala code to build the decision tree is shown below. The results of our experiment are shown below. (1) Confusion Matrix Confusion matrix is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class. (2)Precision and Recall Precision is the fraction of relevant instances among the retrieved instances, while recall is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. (3) the precision and recall for each class 391

5 The experiment results shows that the proposed method can accomplish the evaluation of logging reservoir, compared with traditional methods, it doesn't rely much on expertise and can achieve a better efficiency than traditional evaluation. Conclusion In this paper, we proposed a Spark based logging reservoir evaluation method, by constructing a 3 nodes IBM BigInsights big data platform, using the decision tree algorithm of Spark, we accomplished the evaluation of logging reservoir. Compared with traditional methods, the proposed method doesn't rely much on expertise and can achieve a better efficiency. The proposed method provides a reference for petroleum companies using the big data technology to solve their production problems. References [1] S. D. Mohaghegh, A new methodology for the identification of best practices in the oil and gas industry, using intelligent systems, Journal of Petroleum Science and Engineering, 49(2005), [2] Y. L. Ren and Y. T. Ren, A framework of data mining for logging reservoir evaluation. International Conference on Service Systems & Service management, 2016, 1-6. [3] [4] [5] [6] [7] J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers [8] H.Q. Liu, Principle and Application of Well Logging, Petroleum Industry Press,

An Introduction to Apache Spark

An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations