CHAPTER 3. Preprocessing and Feature Extraction. Techniques

Similar documents
Introduction to Digital Image Processing

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

Effects of multi-scale velocity heterogeneities on wave-equation migration Yong Ma and Paul Sava, Center for Wave Phenomena, Colorado School of Mines

ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/05 TEXTURE ANALYSIS

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Tutorial 3. Jun Xu, Teaching Asistant csjunxu/ February 16, COMP4134 Biometrics Authentication

Louis Fourrier Fabien Gaie Thomas Rolf

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Image Restoration and Reconstruction

Ultrasonic Multi-Skip Tomography for Pipe Inspection

A COMPARATIVE STUDY OF DIMENSION REDUCTION METHODS COMBINED WITH WAVELET TRANSFORM APPLIED TO THE CLASSIFICATION OF MAMMOGRAPHIC IMAGES

Lecture 8 Object Descriptors

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

COMPUTER AND ROBOT VISION

Digital Image Processing

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 1st, 2018

Data Mining and Analytics. Introduction

Image Restoration and Reconstruction

Topic 6 Representation and Description

Image representation. 1. Introduction

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

Improved Centroid Peak Detection and Mass Accuracy using a Novel, Fast Data Reconstruction Method

A-posteriori Diffusion Analysis of Numerical Schemes in Wavenumber Domain

EE795: Computer Vision and Intelligent Systems

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING

Adaptative Elimination of False Edges for First Order Detectors

Data: a collection of numbers or facts that require further processing before they are meaningful

CoE4TN4 Image Processing. Chapter 5 Image Restoration and Reconstruction

TEXTURE. Plan for today. Segmentation problems. What is segmentation? INF 4300 Digital Image Analysis. Why texture, and what is it?

Central Slice Theorem

Chapter 1. Introduction

Spectral Classification

Development of Generic Search Method Based on Transformation Invariance

Multicomponent f-x seismic random noise attenuation via vector autoregressive operators

Practical Image and Video Processing Using MATLAB

Chapter 3 Set Redundancy in Magnetic Resonance Brain Images

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

YEAR 12 Core 1 & 2 Maths Curriculum (A Level Year 1)

Chapter 3: Intensity Transformations and Spatial Filtering

How reliable is statistical wavelet estimation? Jonathan Edgar*, BG Group, and Mirko van der Baan, University of Alberta

Image Processing. Image Features

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data

Mineração de Dados Aplicada

CS 231A Computer Vision (Fall 2012) Problem Set 3

CHAPTER 6. 6 Huffman Coding Based Image Compression Using Complex Wavelet Transform. 6.3 Wavelet Transform based compression technique 106

Compression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction

Fourier Transformation Methods in the Field of Gamma Spectrometry

Response to API 1163 and Its Impact on Pipeline Integrity Management

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

1.Some Basic Gray Level Transformations

CS 223B Computer Vision Problem Set 3

Filtering Images. Contents

Comparison of supervised self-organizing maps using Euclidian or Mahalanobis distance in classification context

1. To condense data in a single value. 2. To facilitate comparisons between data.

Adaptive Quantization for Video Compression in Frequency Domain

Texture Based Image Segmentation and analysis of medical image

Object-Based Classification & ecognition. Zutao Ouyang 11/17/2015

CHAPTER 4 SEGMENTATION

MUSI-6201 Computational Music Analysis

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Video Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features. Wei-Ta Chu

EE 584 MACHINE VISION

Segmentation of Images

Spatial Enhancement Definition

5. Feature Extraction from Images

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Digital Image Processing

Foundation Level Learning Targets Version 2.2

SURVEY ON IMAGE PROCESSING IN THE FIELD OF DE-NOISING TECHNIQUES AND EDGE DETECTION TECHNIQUES ON RADIOGRAPHIC IMAGES

3. Cluster analysis Overview

Research on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models

Periodicity Extraction using Superposition of Distance Matching Function and One-dimensional Haar Wavelet Transform

CoE4TN4 Image Processing

Schedule for Rest of Semester

CREATING THE DISTRIBUTION ANALYSIS

Automatic Classification of Audio Data

Boundary descriptors. Representation REPRESENTATION & DESCRIPTION. Descriptors. Moore boundary tracking

Basic Statistical Terms and Definitions

Outlines. Medical Image Processing Using Transforms. 4. Transform in image space

CHAPTER 3 IMAGE ENHANCEMENT IN THE SPATIAL DOMAIN

Unit WorkBook 2 Level 4 ENG U2 Engineering Maths LO2 Statistical Techniques 2018 UniCourse Ltd. All Rights Reserved. Sample

Time-resolved PIV measurements with CAVILUX HF diode laser

Fundamentals of Digital Image Processing

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

2. LITERATURE REVIEW

3.2 Level 1 Processing

Facial Expression Detection Using Implemented (PCA) Algorithm

Exploring and Understanding Data Using R.

Modelling Data Segmentation for Image Retrieval Systems

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

Multispectral Image Segmentation by Energy Minimization for Fruit Quality Estimation

Texture. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors

An Introduction to Content Based Image Retrieval

Road Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary

ENVI Tutorial: Vegetation Hyperspectral Analysis

Image Enhancement Techniques for Fingerprint Identification

Data Preprocessing. Data Preprocessing

Digital Image Processing. Lecture # 15 Image Segmentation & Texture

Transcription:

CHAPTER 3 Preprocessing and Feature Extraction Techniques

CHAPTER 3 Preprocessing and Feature Extraction Techniques 3.1 Need for Preprocessing and Feature Extraction schemes for Pattern Recognition and Discrimination With the advent of fast computers equipped with large storage and data acquisition systems, researchers involved in areas of diverse applications such as engineering, medicine, geophysics, astronomy etc have to face the problem of processing and analyzing huge volumes of observations for decision-making. Such datasets, in contrast with smaller and more traditional datasets that have been studied extensively in the past, present new challenges in data analysis. Though traditional statistical methods provide reasonably good solutions in general, the increase in the number of samples in real-time studies usually are associated with the number of variables during each observation. The dimension of the data hence plays a vital role during processing datasets since these are related to the number of variables that are measured on each observation. In general, dimensionality reduction may be accomplished by eliminating data closely related with the rest of the data in the set or by merging data which is representative of a smaller set of features. This trade-off between accuracy as represented by the complete data set and the computational cost of retaining the parameters without implementing feature extraction/selection techniques plays a vital role in the performance during pattern recognition. This trade-off leads to the vital aspect in pattern recognition called the curse of dimensionality. 57

While preprocessing refers to feature (attribute) construction from a set of raw data based on techniques such as standardization (scaling), normalization (centering), extraction of local attributes (kernel or syntactic methods), attribute discretization (discrete and finite sets) etc., the role of the feature extraction methods is to acquire the most appropriate set of information from the original data to enable representation of information in a lower dimensionality space based on feature selection, assessment of evaluation criterion etc. Since, both these components play a vital role in pattern recognition it is essential that appropriate choice of the preprocessing and feature extraction is made to obtain optimal results during the classification task. 3.2 Methods of Feature Extraction and Its Relevance to PD pattern recognition- An Overview Data obtained from the PD measurement and acquisition system which best describes the dynamics of discharge patterns are obtained as φ-q-n characteristics. Though several researchers have utilized both the phase and the time resolved approaches for PD pattern recognition of insulation, this study resorts to the former since it has been observed that each discharge pulse reflects the physical process at the discharge site and a strong relationship has been established between the nature of patterns and the type of defect. In this research the data describing the source of PD is characterized into six categories based on phase window technique: 1. Measures based on Maximum values of q (10 and 30 ); 2. Measures based on Minimum values of q (10 and 30 ); and 3. Measures based on Central Tendency (10 and 30 ); 4. Measures based on types of Mean values (10 and 58

30 ); 5. Measures based on mean-slope-angle (10 and 30 ); 5. Measures based on statistical moments; and 6. Measures based on Two Pass Split Window (TPSW) Scheme (10 and 30 ). 3.3 Types of Feature Extraction Techniques for PD Pattern Recognition 3.3.1 Measures based on Basic Statistical Operators and Inequality Relationship of Types of Mean Measures These measures are primarily used for classification of PD patterns utilizing the phase resolved PD (PRPD) approach. The basic uni-variate statistical analysis is taken up for obtaining the statistical operators representing the distribution between φ- q- n patterns. Initially, the phase window technique of representation of the distribution of pulse patterns that describe the basic statistical operators describing the nature of central tendency (mean, median and mode), dispersion (standard deviation, quartile deviation and range) and range of discharge values (maximum and minimum quantities) have been taken up to ascertain the role played by the distribution of pulses in the PD signature taken up for studies. During the entire course of the analysis a phase window of 30 and 10 has been taken up since it was found that a reasonably good index on the issues related to the curse of dimensionality is made evident during the training phase. The following measures are considered during the analysis: 1. Measures based on maximum and minimum values: (a) φ q max n (phase window of 10 ) 59

(b) φ q min n (phase window of 30 ) (c) φ q n max (phase window of 10 ) (d) φ q n min (phase window of 30 ) 2. Measures of Central Tendency (magnitude of q and number of occurrences n ): (a) Mean, Median and Mode (10 window) (b) Mean, Median and Mode (30 window) 3. Measures of Dispersion (magnitude of q and number of occurrences n ): (a) Range, Variance, Standard Deviation and Quartile deviation (10 window) (b) Range, Variance, Standard Deviation and Quartile Deviation (30 window) A new approach of utilizing statistical measures pertaining to types of mean and its inequality expression has also been exploited in this research. This aspect has been utilized recently by a few researchers in allied fields of engineering [75] who have reported on its significant success since it is observed to have served as an effective technique in reducing the dimensionality nevertheless conserving the relationship among the input feature vectors. Hence an attempted has also been made in this research to ascertain the effectiveness of the proposed inequality expression in providing a compact set of extracted features. The following phase window measures have been taken up for analysis: Harmonic Mean (HM) Geometric Mean (GM) Arithmetic Mean (AM) Root Mean Square (30 and 10 window) 60

3.2.2 Measures based on Higher Order Statistical Moments as Mathematical Descriptors Since several studies [11] by researchers have been carried out utilizing the traditional statistical operators as mathematical descriptors for PD pattern recognition studies with considerable level of success, this study also envisages an analysis on the role played by the mathematical descriptors namely mean, standard deviation, kurtosis, skewness, crosscorrelation and modified cross-correlation in obtaining the features that describe the pulse pattern signatures. These descriptors have been processed and acquired for 10 and 30 phase windows. 3.2.3 Measures based on Two Pass Split Window (TPSW) Filter Technique Recently, a few research studies have utilized effectively the TPSW technique as a preprocessing and feature extraction technique in divergent fields such as speech recognition [76], sonar signal processing [77], target recognition [78] etc. This scheme has earlier found wide application in audio signal processing wherein the primary focus is to reduce the influence of spurious noise artifacts that are invariably introduced during the recording or playback methods which attempt at preserving the original sound to a considerable extent. It has been reported recently that this scheme has been successfully utilized in classification of underwater sonar signals radiated by ships. This aspect has been clearly delineated in [39] wherein, in essence, the spectrum of such radiated sonar signals consist of two different spectral types namely: 1. Broad-band, which comprises a 61

continuous spectrum and 2. Narrow-band, which has a discontinuous spectrum containing components occurring at discrete frequencies wherein the long pulses are filtered from the tonal. Hence, effective extraction of tonal features from mixed spectra is the essence in applications involving classification of sonar signals. It is hence evident that this aspect is analogous to the task of extracting the discontinuous pulsating components pertaining to the PD patterns since it correlates and corresponds well in comparison with the tones of the radiated sonar spectrum signals. TPSW filtering scheme provides a mechanism for obtaining smooth local-mean estimates of the signal notwithstanding the presence of spurious noisy spikes in the analyzed signal. The basic concept involves carrying out a simple moving-average filtering over a segment of the signal comprising a long pulse. In this approach, the continuous spectrum is estimated first and consequently the tonal components are extracted. The algorithm for the TPSW scheme is indicated Figure 3.1. Step 1: For signal f(x), select a window centered on k bins: R k = { k M, k M + 1,..., k,... k + M 1, k + M } (Number of bins in the windows is 2M+1) Step 2: First Pass: Computing the local mean k = + M 1 f ( k) = f ( i) 2M + 1 i k M Step 3: Forming a clipped sequence (to avoid biasing estimate of the local mean due to a tonal): f ( k) : g( k) = f ( k) : f ( k) α f ( k) f ( k) > α f ( k) α is a constant usually set to 0.5 and serves as an a-priori estimate. Step 4: Second Pass: Obtain the continuous spectrum (broad band component) by calculating the local mean using g (k): k = + M 1 m ( k) = g( i) 2M + 1 i k M Step 5: Computing Narrowband component h (k): h( k) = f ( k) m( k) Figure 3.1: Algorithm for TPSW Feature Extraction Scheme 62

3.4 Summary The need for preprocessing and feature extraction during pattern recognition studies, relevance of data compaction and dimensionality reduction, Phase Resolved PD (PRPD) based phase window approach has been deliberated. An insight into the various preprocessing and feature extraction techniques based on simple statistical measures (central tendency, dispersion and range), stochastic measures based on higher order moments (skewness, kurtosis, correlation and cross-correlation), measures based on statistical inequality expression for types of mean (arithmetic mean, harmonic mean, geometric mean and root mean) and signal processing based measures (TPSW scheme) has been provided. The following major aspects are summarized: 1. The need for feature extraction technique that provides an approach which is progressively and sequentially more complex depending on the nature of the PD signature datasets is evident thus necessitating various methodologies ranging from simple statistical measures to signal processing based measures. 2. The relevance of utilizing the novel TPSW scheme approach for feature extraction in PD pattern recognition is expounded. Its ability in discriminating signals based on discrimination in terms of two frequency bands (during two different phases of the algorithm) makes the scheme relevant for extraction of appropriate features pertaining to PD pattern signatures. 63